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Chapter  1       Introduction 

1.  Overview 

As  computers  become  more  integral  to  daily  lives,  the  integrity  of  the  computer  activities 
becomes  increasingly  crucial.  To  that  end  there  has  been  increased  research  into  the  area 
of  maintaining  integrity  in  a  computer  environment. 

The  definition  of  integrity  from  the  NIST  Workshop  on  Integrity,  January  1989  is: 

The  property  that  data,  an  information  process,  computer  equipment,  and/or 
software,  people,  etc.,  or  any  collection  of  these  entities,  meet  an  a  priori  expecta- 
tion of  quality  that  is  satisfactory  and  adequate  in  some  specific  circumstance. 

The  workshop  and  its  related  activities  were  held  because  there  is  a  growing  concern  for 
integrity  of  information  stored  in  computers  and  in  machine  readable  format  on  storage 
devices.  The  threats  to  this  integrity  are  numerous  and  serious  ranging  from  those  that 
can  be  considered  unintentional  to  those  that  can  be  categorized  as  malice  with  fore- 
thought. 

Most  of  the  problems  with  integrity  of  data  and  programs  can  be  categorized  as  uninten- 
tional. The  entering  of  incorrect  data  is  probably  the  largest  threat  to  computer  integrity, 
followed  in  a  close  second  place  by  errors  caused  unintentionally.  There  is  a  growing 
concem  about  threats  to  integrity  from  programs  whose  actions  intentionally  do  not  meet 
their  program's  specifications,  known  as  Trojan  Horses.  A  Trojan  horse  is  a  piece  of 
code  that  is  surreptitiously  placed  in  a  program  in  order  to  perform  functions  not  adver- 
tised by  the  program  specifications.  [MAE87]    A  type  of  Trojan  Horse  that  is  particularly 
dangerous  is  a  computer  virus.  In  this  research  a  virus  is  defined  as  a  program  that  can 


"infect"  other  programs  by  modifying  them  to  include  a,  possibly  evolved,  copy  of  itself. 
With  the  infection  property,  a  virus  can  spread  throughout  a  computer  system  or  network 
using  the  authorizations  of  every  user  using  it  to  infect  their  programs.  Every  program 
that  is  infected  may  also  act  as  a  virus  and  thus  the  infection  spreads  [COH84].  Because 
viruses  can  spread  so  rapidly  and  have  the  potential  to  destroy  the  integrity  of  large 
amounts  of  data  an  effective  means  must  be  found  to  counteract  this  threat  to  integrity. 
This  thesis  describes  research  which  copes  with  the  problem  of  detection  of  virus  infected 
files  using  techniques  developed  to  maintain  the  integrity  of  files  of  data. 

1.2  Models  of  integrity 

Several  models  for  insuring  the  integrity  of  computerized  files  have  been  advanced. 
Although  the  Biba  model  [BIB77]  was  introduced  twelve  years  ago,  the  past  three  years 
has  shown  an  increased  interest  in  the  integrity  area.  The  recently  introduced  Clark  and 
Wilson  model  [CLA87]  [WIL89]  has  drawn  particular  interest.  These  two  models,  Biba 
and  Clark  &  Wilson  are  described  as  follows. 

1.2.1  Biba 

The  Biba  Model  is  based  on  the  definition  of  integrity  as  a  multivalued  quantity,  versus 
the  binary  propeny  of  the  integrity  definition  of  the  NIST  workshop.  With  the  Biba 
model,  data  and  processes  are  given  an  integrity  label  in  a  range  defined  for  the  system. 
An  integrity  lattice  for  the  system  can  be  constructed  from  the  integrity  labels  on  the  data 
items.  If  an  implementation  of  the  Biba  model  meets  the  model's  specification,  that 
implementation  insures  that  a  process  can  not  reduce  the  integrity  label  and  thus  the 
integrity  of  a  file  of  data. 


The  strict  Biba  model  is  a  dual  of  the  Bell-LaPadula  lattice  security  model.  [HEN87]    A 
process  in  the  Biba  model  is  not  allowed  to  write  to  data  which  have  a  higher  integrity 
label  (corresponding  to  "no  write  down"  property  of  Bell-LaPadula)  and  is  not  allowed  to 
read  from  data  which  have  a  lower  integrity  label  (corresponding  to  the  "no  read  up" 
property  of  Bell-LaPadula).  Thus,  information  in  a  lower  integrity  level  cannot  corrupt 
information  in  a  higher  integrity  level.  Using  a  proof  by  induction,  if  data  is  in  a  valid 
integrity  state  it  can  be  shown  that  at  all  future  times  it  will  remain  in  a  valid  state  assum- 
ing a  Biba  model  of  integrity  is  imposed. 

Biba  also  proposed  two  variants  of  his  strict  integrity  policy,  ring  policy  integrity  and  low 
water  mark  integrity.  With  the  ring  integrity  policy  no  restrictions  are  placed  on  the 
reading  of  data,  but  the  constraints  on  writing  to  an  object  are  the  same  as  the  strict 
integrity  policy.  Low  water  mark  integrity  changes  the  subject's  integrity  label  to  that  of 
the  object's  integrity  label  when  the  object's  integrity  label  is  less  than  that  of  the 
subject's  integrity  label.  [HEN87]  There  are  also  other  variations  of  the  strict  Biba 
integrity  policy.  [DEN86]  [SfflSl]  [BOY78] 

There  are  several  problems  with  the  Biba  model  and  its  variants.  Strict  Biba  does  not 
appear  flexible  enough  to  be  useful  in  practical  applications  since  these  applications  must 
have  read  and  write  access  to  various  system  tables  and  internal  data  structures  in  order  to 
perform  their  functions.  [HEN87]  Another  serious  problem  with  a  strict  Biba  integrity 
policy  occurs  when  it  is  combined  witii  the  Bell-LaPadula  security  model.  This  combina- 
tion causes  isolation  of  data  at  lattice  nodes  to  occur  (this  combination  partitions  systems 
into  closed  subsets  under  transitivity).  [COH84]  Strict  Biba  also  has  no  automatic 
mechanism  to  incorporate  new  data  into  die  hierarchy.  When  flexibility  is  introduced  to 


counter  the  constraints  of  the  strict  Biba  model  there  is  migration  of  data  to  a  lower  level, 
as  occurs  with  the  low  water  mark  integrity,  or  there  is  the  problem  of  integrity  corrupt- 
ing mechanisms  migrating  across  integrity  levels.  [COH84]  Managing  a  Biba  type 
implementation  is  also  difficult.  Most  lattice  model  (Biba)  designs  to  date  have  consid- 
ered 64  categories  to  be  a  large  number.  [KAR88]  Large  systems  will  have  thousands  of 
distinct  categories  because  to  effectively  limit  the  operations  between  a  subject  and 
similar  objects  that  must  be  treated  differentiy  will  require  a  separate  label.  Managing 
large  numbers  of  categories  is  not  unique  with  Biba  systems  and  will  extract  performance 
penalties  on  all  general  integrity  policies. 

1.2.2  Clark  &  Wilson 

The  Clark  &  Wilson  model  insiu^es  the  expectation  that  the  integrity  of  systems  and  data 
remain  predictably  constant  and  change  only  in  highly  controlled  and  structured  ways. 
Though  the  original  Clark  &  Wilson  paper  [CLA87]  was  expressed  in  terms  of  nine  rules, 
Lee  captured  the  essence  as: 


All  data  (of  interest)  must  be  modified  by,  and  only  by,  authorized 
well-formed  transactions  where  the  principle  of  separation  of  duties  is 
used  to  limit  who  can  perform  what  transactions  and  make  what 
changes  to  the  system.  [LEE88] 


With  the  Clark  &  Wilson  model  internal  consistency  and  good  correspondence  to  real- 
world  expectations  for  systems  and  data  are  provided.  [WIL89]  Correspondence  to  real- 
world  expectations  is  accomplished  by  Integrity  Verification  Procedures  (IVPs).  These 
procedures  check  the  model  formed  by  data  in  the  computer  system  against  the  real 
world  perception  of  the  model.  The  IVP  not  only  provide  correspondence  to  the  real- 
world  but  also  checks  the  internal  consistency  of  the  data.  After  an  IVP  the  data  has 


integrity.  An  example  of  a  practical  IVP  is  physically  counting  the  inventory  at  a  loca- 
tion and  checking  that  the  computer  system  designed  for  tracking  that  inventory  corre- 
sponds to  what  was  physically  found. 

A  crucial  second  feature  of  the  Clark  and  Wilson  model  is  controlling  change.  Between 
IVP  execution  on  a  set  of  data  any  changes  to  the  data  must  be  strictly  controlled  in  order 
to  maintain  internal  consistency  and  thus  integrity. 

Controlling  changes  can  take  four  forms  determined  by  the  structure  and  use  of  the  data: 
prevention  of  change,  attribution  of  change,  constraint  of  change,  and  partition  of  change. 

For  data  that  does  not  change  in  the  real  world  the  prevention  of  change  is  desirable. 
Using  the  Clark  &  Wilson  model,  if  it  can  be  shown  that  the  data  was  correct  at  one  time 
and  has  not  been  changed  then  the  integrity  of  the  data  is  maintained.  An  example  of  a 
file  where  the  use  of  prevention  of  change  is  appropriate  would  be  a  file  of  executable 
programs  that  rarely  change. 

For  unstructured  data  the  integrity  of  data  can  be  determined  if  the  data  and  author  (origi- 
nal and  of  changes)  are  bound  in  an  unforgeable  way.  If  the  data  has  been  changed  the 
integrity  can  be  maintained  by  binding  the  history  of  the  changes  and  the  authors  of  those 
changes  to  the  data.  An  example  of  data  appropriate  for  the  control  mechanism  of  attri- 
bution of  change  would  be  memos  or  reports. 

Highly  structured  data,  such  as  accounting  records,  should  only  be  modified  in  very 
controlled  manners.  If  only  certain  programs  and  users  are  allowed  to  modify  the  data, 
this  method  is  called  constraint  of  change. 

5 


In  order  to  prevent  fraud,  the  changing  of  some  types  of  data  should  require  that  the 
change  be  authorized  by  two  different  people,  i.e.  partition  of  change.  Money  transfers 
by  wire  should  be  controlled  by  this  separation  of  duty. 

1.3  Prevention  of  Change 

This  section  will  elaborate  on  the  concepts  involved  in  the  prevention  of  change  as  it  is 
the  detection  of  change  that  we  wish  to  focus  upon.  To  prevent  change,  the  system  must 
either  prohibit  change  through  access  control  or  identify  that  change  has  occurred  and 
take  appropriate  action. 

1.3.1  Access  Control 

It  is  possible  to  design  a  system  in  which  there  is  a  category  of  data  that  should  not  be 
changed.  The  prevention  of  modification  is  accomplished  by  some  form  of  an  access 
matrix  model.  The  access  matrix  model  consists  of  a  triple  (Subject,  Object,  Access 
Matrix).  Subjects  are  active  entities.  Objects  are  protected  entities  to  which  access  must 
be  controlled,  and  the  Access  Matrix  is  a  matrix  in  which  rows  correspond  to  subjects  and 
columns  correspond  to  objects,  where  a  entry  stores  the  access  rights  of  the  subject  to  the 
object.  [MIZ87]  Rights  are  the  operations  that  the  subject  can  perform  on  the  object. 
Since  the  matrix  tends  to  be  very  sparse  (i.e.  most  subject  -  object  pairs  have  no  rights) 
the  matrix  typically  is  implemented  as  a  list  of  subjects  that  have  access  rights  to  an 
object  (Access  Control  List)  or  as  a  list  of  objects  to  which  a  subject  has  rights  (Capabil- 
ity Lists).  The  two  methods  yield  major  differences  in  the  type  of  protection  provided. 

Access  Control  Lists  (ACLs)  are  the  most  common  form  of  integrity  (and  security) 
control.  It  is  a  column-based  view  of  the  Access  Matrix  derived  from  the  nonempty 


entries  of  an  object.  An  object  has  a  list  of  pairs  (subject,  right)  indicating  the  subjects 
that  have  access  to  the  object  and  the  rights  for  each  subject.  Rights  typically  are  read, 
write,  and  execute.  If  a  subject,  s,  tries  to  access  an  object,  the  list  of  access  (access 
control  list)  for  that  object  is  searched.  If  an  entry  for  subject,  s,  does  not  exist  or  if  it 
does  exist  but  the  requested  rights  do  not  occur  in  that  entry  the  request  is  refused.  Typi- 
cally a  subject  acting  on  the  request  of  another  subject  obtains  the  rights  of  the  originat- 
ing subject.  For  example,  a  user  can  execute  a  compiler  which  then  will  have  all  the 
rights  of  the  user. 

ACLs  suffer  problems  in  regards  to  integrity  in  both  implementation  and  theory.  The 
implementation  is  typically  very  coarse-grained  in  the  size  of  objects  and  the  small 
number  of  rights  that  can  be  granted.  ACLs  normally  are  applied  at  the  file  level,  so  they 
cannot  maintain  integrity  for  a  part  of  a  file  that  needs  to  be  treated  differendy  for  access 
purposes.  This  is  compounded  by  the  small  number  of  rights  that  are  used.  The  combi- 
nation of  Read  and  Write  are  sufficient  to  accomplish  all  features  of  a  computer  system, 
but  if  only  these  are  used  (or  even  with  the  addition  of  execute)  then  the  user  may  not  be 
sure  that  data  is  modified  in  a  manner  maintaining  integrity.    The  problem  with  implem- 
entation is  not  one  of  ACL  theory.  It  should  be  possible  to  decrease  the  size  of  objects 
which  are  protected  and  increase  the  number  of  rights  available  but  at  an  increase  in  the 
cost  of  storage  and  efficiency. 

The  theory  of  the  ability  to  transfer  rights  is  a  much  more  serious  flaw  with  respect  to 
integrity.  Programs  operating  on  a  user's  behalf  have  all  the  rights  of  the  user.  Any  data 
that  is  accessible  for  change  by  the  user  is  accessible  for  change  by  the  program  executed 
for  the  user.  This  accessibility  makes  ACLs  very  vulnerable  to  any  program  that  per- 


forms  a  surreptitious  or  unadvertised  function,  i.e.  a  Trojan  Horse.  If  a  Trojan  Horse 
resides  in  tlie  C  compiler  it  then  has  tiie  access  rights  to  all  the  files  to  which  the  the  user 
has  access  rights.  Thus  it  can  modify  or  delete  any  objects  to  which  the  user  has  write 
access. 

The  alternative  form  of  the  Access  Matrix  viewed  from  a  row  basis  is  the  Capability  List. 
A  subject  has  a  list  of  objects  it  has  capabilities  (rights)  to  which  defines  the  domain  of 
the  subject.  [MIZ87]  When  an  object  is  invoked  the  system  determines  if  it  is  in  the 
capability  list  of  the  subject  and,  if  so,  allows  the  operation  to  continue.  The  implementa- 
tion of  Hydra  [COH75]  allows  rights  amplification  which  handles  abstract  data  types 
easily.  A  good  implementation  of  Capability  Lists  provides  an  excellent  means  of  integ- 
rity control  because  it  naturally  provides  a  mechanism  for  each  program  to  be  executed  in 
the  smallest  possible  domain.  [MIZ87]  Due  to  other  considerations  such  as  the  concept 
of  ownership  there  are  very  few  systems  using  Capability  Lists. 

There  have  been  attempts  to  provide  die  integrity  protection  of  Capability  Lists  without 
the  drawbacks  of  Capability  Lists.  Two  examples  are  the  Four-tuple  ACL  [MIZ87]  and 
the  Access  Control  Triple  [WIL89].  Witii  tiie  Four-tuple  ACL,  each  subject  in  an  ACL 
entry  is  represented  by  a  four-tuple  of  user  ED,  class  ID,  module  ID,  and  exported  proce- 
dure name.  This  effectively  limits  the  domain  available  for  Trojan  Horses  to  the  same 
degree  as  the  Hydra  system.  It  also  provides  control  over  users  because  users  can  only 
view  or  change  data  through  the  levels  of  the  subject  IDs.  A  simpler  concept  is  the 
Access  Control  Triple  which  binds  user,  program  and  data.  Flexibility  would  not  be  as 
great  as  in  a  Four-tuple  ACL  since  fewer  grouping  are  possible,  but  implementation 
would  be  easier 


1.3.2  Checksum  Techniques 

Another  method  of  insuring  that  data  has  not  changed  is  to  attach  additional  information 
that  at  some  level  of  confidence  assures  that  the  data  has  not  changed.  This  is  typically  in 
the  form  of  a  checksum.  A  checksum,  or  digital  signature,  is  any  fixed  length  block 
functionally  dependent  on  every  bit  of  the  message,  so  that  different  messages  will  have 
different  checksums  with  a  high  probability.  [DEN82]  A  checksum  can  be  evaluated  on 
two  features:  the  ability  to  prevent  forgery  and  the  computational  complexity  of  the 
algorithm  that  creates  it.  Checksums  can  be  determined  in  two  basic  manners:  using 
cryptography  or  using  a  deterministic  (noncryptographic)  algorithm. 

Cryptography  is  defined  as  the  methods  and  process  of  transforming  an  intelligible 
message  into  an  unintelligible  form  and  reconverting  the  unintelligible  form  into  the 
original  message  through  a  reversal  of  the  process  of  transformation.  The  original  mes- 
sage is  referred  to  as  the  plain  text  and  the  enciphered  message  is  called  the  cipher  text. 
A  cipher  system  consists  of  the  following  two  items:  1.  A  set  of  rules  that  comprise  the 
basic  cryptographic  process  (called  the  general  system,  is  agreed  upon  in  advance,  and  is 
constant  in  nature),  and  2.  A  key,  which  may  be  variable.  [KAT73] 

Converting  plain  text  to  cipher  text  is  known  as  encryption,  while  converting  cipher  text 
back  to  plain  text  is  known  as  decryption.  The  process  can  be  described  by  the  transfor- 
mation: plaintext  — >  cipher  text  — >  plaintext  or  in  other  terms:  f(plain  text,  encryp- 
tion key)  =  cipher  text;  f(cipher  text,  decryption  key)=plaintext.  If  the  encryption  key  is 
not  equal  to  the  decryption  key  the  cryptographic  system  is  known  as  a  public  key 
cryptography  system.  If  the  encryption  key  is  the  same  as  the  decryption  key  then  the 


cryptographic  system  is  known  as  a  private  key  cryptography  system.    When  the  keys 
are  different  it  is  possible  to  broadcast  or  distribute  (make  public)  the  encryption  key  for 
other  parties  to  send  messages  that  only  the  parties  knowing  the  decryption  key  can 
convert  back  to  plain  text.  With  private  key  systems  since  both  the  encryption  and 
decryption  keys  are  identical  the  key  must  be  kept  secret  (or  private)  in  order  to  prevent 
unauthorized  parties  from  deciphering  the  cipher  text. 

Cryptographic  checksum  techniques  use  encryption  in  some  manner  to  calculate  the 
checksum.  Typically  a  form  of  public  key  algorithms  like  the  Rivest  Shamir  Adleman 
(RSA)  scheme  [RIV79]  or  private  key  algorithms  like  Data  Encryption  Standard  (DES) 
[DEN82]  in  feedback  mode  are  used  to  produce  a  32  to  128  bit  value  called  the  check- 
sum. The  checksum  can  be  stored  with  the  data  that  was  checksummed  or  in  a  safe 
location  (safe  from  surreptitious  modification).  Using  cryptographic  checksums  in  which 
the  checksums  are  stored  separately  is  more  secure  in  terms  of  forgeability.  Since  a 
cryptographic  checksum  requires  a  key,  the  ability  to  forge  a  cryptographic  checksum  is  a 
two  step  process  when  the  checksum  is  stored  separately.  First,  the  key  must  be  deter- 
mined, and  second  a  different  set  of  data  (or  modification  of  the  same  data),  with  the 
same  checksum  must  be  found  to  substitute  in  place  of  the  real  data.  If  the  checksum  is 
stored  with  the  data,  or  in  a  modifiable  location,  then  only  the  key  must  be  known  since 
any  data  with  a  legal  checksum  can  replace  the  original  data. 

The  use  of  cryptographic  checksums  in  which  the  checksums  are  stored  separately  is 
more  secure  in  terms  of  forgeability.  The  use  of  cryptographic  checksums  with  the 
checksum  stored  with  the  data  must  be  secure  from  known  plaintext  attacks  and  the  key 
management  must  be  secure.  If  the  key  is  known  to  an  attacker  then  it  will  take  on  the 
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average  2"''  mutations  of  the  desired  forgery  to  insert  the  forgery  using  the  brute  force 
attack  described  in  section  1.3.3.1,  where  n  is  the  length  of  the  checksum  in  bits.  That  is, 
the  checksum  of  each  mutation  has  a  probability  of  2"  of  matching  the  stored  checksum, 
and  there  is  a  50%  chance  of  a  match  after  ln(2)*2""'  mutations.  In  order  to  increase  secu- 
rity the  file  can  be  checksummed  and  then  encrypted  to  attempt  to  foil  a  plain  text  attack. 
The  encryption  of  the  file  every  time  it  is  used  probably  would  be  considered  undesirable 
on  all  but  the  fastest  computers. 

The  drawback  to  cryptographic  checksums  is  the  high  degree  of  Computational  complex- 
ity of  the  algorithms.  Encryption  typically  is  a  very  computationally  complex  activity 
leading  to  very  slow  checksum  computation.  [HAR85]  Implementing  a  secure  crypto- 
graphic checksum  using  RSA  can  take  minutes  or  even  hours  for  data  of  a  reasonable 
length.  Cohen  describes  a  hardware  implementation  with  a  speed  of  6,500  bits/sec. 
[COH86]  This  slow  speed  is  inadequate  for  practical  use. 

DES  is  less  secure  but  much  faster,  especially  if  implemented  in  hardware.  However, 
DES  has  the  problem  of  private  key  management.  Private  key  management  is  required 
since  the  same  key  is  used  to  encode  and  decode  a  message.  Therefore  the  key  can  not  be 
stored  where  it  can  be  accessed  by  an  attacker.  To  remove  access  from  an  attacker 
implies  that  tiie  checksum  must  also  be  inaccessible  to  the  checksum  routine.  The  practi- 
cal implication  of  this  is  that  the  key  must  be  entered  each  time  the  checksum  routine  is 
executed. 

Noncryptographic  checksums  do  not  provide  the  same  degree  of  security  from  forgery  as 
cryptographic  checksums  with  the  checksum  stored  in  a  secure  place.  A  noncryptogra- 
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phic  checksum  can  be  considered  equivalent  to  a  cryptographic  checksum  with  a  dis- 
closed key  (keys  in  public  key  encryption).  Since  the  noncryptographic  algorithm  does 
not  need  to  be  designed  to  prevent  discovery  of  the  key,  typically  such  algorithms  are 
much  less  computationally  complex.  Being  computationally  less  complex  translates  into 
a  much  faster  operating  speed. 

1.3.3  Attacks  against  Checksums 

In  this  research  three  types  of  attacks  by  a  forger  on  a  set  of  data  and  its  generated  check- 
sum are  considered.  All  three  attacks  assume  that  the  attacker  knows  the  checksum 
algorithm,  can  change  the  set  of  data,  and  can  read  the  checksum.  The  three  categories  of 
attacks,  which  are  discussed  below,  are  the  brute  force  attack,  the  birthday  attack,  and  the 
trap  door  attack. 

1.3.3.1  Brute  Force  Attack 

A  brute  force  attack  involves  generating  many  different  sets  of  data  until  a  set  of  data  is 
found  that  has  the  same  checksum  as  the  original  set  of  data.  Formally,  given  a  set  of 
data  X  and  a  checksum  algorithm  f(x)=y;  determine  an  x'  such  that  f(x')=y.  The  set  of 
data,  x',  which  has  the  same  checksum  as  the  original  set  x,  is  insened  in  place  of  the 
original.  Because  x'  has  the  same  checksum  as  x,  it  is  not  detected  as  a  forgery.  A  more 
likely  alternative  to  the  generation  of  many  sets  of  data  is  for  the  forger  to  insert  the 
desired  data  into  the  original  set  of  data  then  mutate  the  rest  of  the  original  data  until  a 
checksum  match  is  found.    This  mutation  technique  allows  the  forger  to  change  only 
small  sections  of  the  data  while  keeping  the  rest  of  the  data  unchanged.  Thus  the  user  of 
the  data  may  remain  unaware  of  the  forgery  because  most  of  the  data  used  is  unchanged. 
If  the  checksum  algorithm  provides  an  even  mapping,  described  in  section  2.1.1,  then  a 
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forger  needs  to  generate  on  the  average  2"''  sets  of  data,  where  n  is  the  number  of  bits  in 
the  checksum,  before  a  checksum  is  found  which  matches  the  checksum  of  the  original 
data.  For  instance,  a  checksum  with  16  bits  would  require  a  forger  to  generate  32,768 
sets  of  data  before  there  is  a  50%  probability  of  finding  a  checksum  match. 

1.3.3.2  Birthday  Attack 

The  birthday  attack  is  a  forgery  accomplished  by  the  originator  of  the  data.  A  birthday 
attack  involves  generating  many  variations  of  an  original  set  of  data,  the  corresponding 
checksums  and  many  variations  of  the  set  of  data  to  be  inserted  and  their  checksums. 
Since  any  pair  of  original  data  and  forged  data  provides  a  successful  forgery  the  number 
of  variations  needed  to  be  generated  is  greatiy  reduced.  A  description  of  the  birthday 
attack: 

1)  The  attacker  secretiy  prepares  a  number  of  subtie  and  inconsequential  changes  to  the 
valid  set  of  data  and  calculates  a  checksum  for  each  one. 

2)  An  equally  large  number  of  variations  of  bogus  data  sets  is  generated  along  with  the 
checksum  for  each  one. 

3)  The  checksums  generated  in  step  1  are  compared  against  the  checksums  generated  in 
step  2. 

4)  If  no  match  is  found  additional  variations  are  generated  until  a  match  is  found. 

5)  The  real  data  set  which  shares  the  same  checksum  with  a  bogus  data  set  is  placed  on 
the  system.  At  a  later  time  the  bogus  data  set  with  die  same  checksum  is  substituted. 

The  birthday  attack  will  succeed  by  producing  a  forgery  on  average  after  2"^  checksums 
are  generated  compared  witii  2""^  for  a  brute  force  attack  described  in  section  1.3.3.1.  For 
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a  16  bit  checksum  the  number  of  checksums  necessary  to  be  generated  on  average  for  a 
birthday  attack  is  only  256,  compared  with  32,768  for  the  brute  force  attack. 

1.3.3.3  Trap  Door  Attacks 

A  trap  door  attack  is  a  variation  of  the  brute  force  attack.  The  possibility  of  a  trap  door 
attack  occurs  when  the  forger  can  invert  the  checksum  algorithm  to  determine  a  set  data 
that  produces  the  same  checksum  as  the  original  data.  Using  the  checksum  algorithm 
f(x)=y  a  trap  door  exists  if  it  is  possible  to  determine  a  function  g(y)  =  x'  where  x'  is  one 
or  more  sets  of  data  satisfying  f(x')  =  y  or  equivilently,  g(f(x))=x.  This  g()  is  known  as 
the  inverse  of  f().  If  g(y)  can  be  determined  then  the  checksum  algorithm  is  susceptible 
to  a  trap  door  attack  since  the  forger  could  generate  sets  of  data  that  match  the  checksum 
of  the  original. 

Trap  door  attacks  are  much  less  expensive  in  terms  of  computation  effort  than  brute  force 
attacks.  A  forgery  is  generated  each  time  the  inverse  function  is  used.  It  is  possible  to 
not  only  generate  forgeries,  but  to  analyze  those  forgeries  for  their  desirability  as  forger- 
ies. If  an  attacker  wishes  to  insert  a  bit  pattern  into  a  set  of  data  at  any  location  he  would 
use  this  inverse  function  to  generate  forgeries  with  the  same  checksum  as  the  original 
until  the  desired  bit  pattern  occurred  in  one  of  the  forgeries.  Then  the  attacker  would 
insert  that  forgery  in  place  of  the  original  data. 

It  is  very  difficult  to  show  that  a  trap  door  does  not  exist  since  there  is  no  standard 
method  for  determining  if  a  trap  door  exists. 
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1.3.3.4  Comparison  of  Attacks. 

Of  the  three  attacks:  brute  force,  birthday,  and  trap  door,  the  trap  door  attack  is  the  most 
serious.  As  discussed,  the  birthday  attack  is  not  a  genuine  threat  in  the  case  where  the 
author  of  data  is  trusted.  The  brute  force  attack  is  good  for  a  benchmark  for  general 
forgery,  but  the  effort  to  generate  a  single  forgery  is  high  and  the  effort  to  generate  a 
forgery  that  is  useful  to  the  attacker  is  very  high.  In  contrast,  the  trap  door  attack,  once  a 
trap  door  is  determined,  is  a  very  serious  threat.  The  effort  to  generate  forgeries  is  small 
compared  to  the  brute  force  attack  and  the  g(y)  function  can  be  used  to  generate  possible 
forgeries  until  a  virus  is  formed.  Any  checksum  algorithm  against  forgery  should  be  free 
from  trapdoors. 

1.4  Viruses 
1.4.1  Description 

A  virus  is  a  program  that  can  'infect'  other  programs  by  modifying  them  to  include  a, 
possibly  evolved,  copy  of  itself.  [COH84]  A  virus  typically  has  the  following  capabili- 
ties: 

-  identification  -  it  can  identify  other  files  which  can  be  modified. 

-  infection  -  it  can  modify  zero  or  more  of  the  files  identified  in  any  execution. 

-  action  -  it  can  take  an  action.  The  option  to  take  an  action  and  what  action  to  take 
can  be  based  upon  the  value  of  a  trigger  which  is  usually  the  satisfaction  of  a  logical 
expression  often  based  on  external  information,  e.g.  the  date. 

Viruses  may  have  a  "time  bomb"  feature  such  that  when  a  logical  expression  is  met  then 
a  specified  action  is  taken.  Such  actions  in  recent  viruses  have  ranged  from  displaying  a 
message  of  world  peace  on  tiie  screen  to  reformatting  tiie  disk. 
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A  typical  virus  exists  as  a  code  segment  usually  as  the  first  part  of  a  useful  program.  As 
the  useful  program  is  executed  eventually  the  virus  is  executed.  When  the  virus  code 
segment  is  executed  it  identifies  possible  programs  to  infect  (replicate  itself  into)  then 
decides  if  it  chooses  to  insert/append  a  copy  of  itself  into  the  machine  language  code  of 
one  or  more  of  the  identified  programs.  When  one  of  the  newly  infected  programs  is 
executed  the  insertion  process  is  repeated.  With  the  infection  property,  a  virus  can  spread 
throughout  a  computer  system  or  network  using  the  authorizations  of  every  user  using  it 
to  infect  their  programs.  Every  program  that  gets  infected  may  also  act  as  a  virus  and 
thus  the  infection  spreads  [COH84].  The  trigger  mechanism  of  the  virus  is  executed  as 
pan  of  the  virus  code  segment.  The  trigger  determines  what  additional  action  the  virus 
takes.  For  example,  on  any  Fridays  that  also  fall  on  the  13th  of  the  month  all  the  files 
accessible  to  the  virus  are  erased. 

In  an  attempt  to  hide  the  existence  and/or  spread  of  a  virus,  the  designers  can  design  more 
complex  viruses.  Some  of  the  features  of  more  complex  viruses  include:  insuring  that 
files  ah-eady  infected  are  not  reinfected,  not  infecting  additional  programs  every  time  the 
host  code  of  the  virus  is  executed,  mutating  the  code  of  the  virus  but  with  the  desired 
functionality  preserved,  and  searching  for  threats  to  the  virus  and  disabling  those  threats. 

Most  current  viruses  appear  to  be  relatively  simple,  but  in  the  future  more  complex 
viruses  with  some  or  all  of  the  features  mentioned  above  will  present  threats.  Though 
advanced  viruses  will  present  formidable  threats  they  must  draw  on  the  resources  of  the 
computer  system  where  they  are  running.  Thus,  viruses  do  not  have  infinite  resources 
available  to  them  to  provide  defenses  or  break  checksum  detection  techniques.  This  lack 
of  infinite  resources  makes  it  possible  to  use  noncryptographic  checksums  to  tell  if  a 
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program  has  been  infected.  Otherwise,  if  the  virus  had  infinite  resources,  the  system  deg- 
radation would  call  attention  to  the  virus  and  speed  its  eventual  eradication  by  system  ad- 
ministrators. 

1.4.2  Current  means  of  control 

As  expected,  methods  of  protecting  data  from  modification  also  provide  protection  from 
viruses.  There  are  several  methods  of  protecting  files  against  viruses.  These  include: 
access  control,  virus  filters,  snapshots,  runtime  models  and  encryption. 

Access  control  can  do  much  to  limit  the  spread  and  damage  caused  by  viruses.  Specifi- 
cally, Capability  Lists,  or  systems  with  similar  benefits,  provide  the  most  comprehensive 
protection  from  viruses.  In  a  Capability  List  system  viruses  are  essentially  limited  to 
only  the  domain  in  which  their  host  program  is  allowed  to  execute.  Unfonunately, 
capability  lists  exist  only  on  a  few  computer  systems.  Access  Control  Lists  are  the 
dominate  form  of  access  control  protection.  Access  Control  Lists  do  not  prevent  the 
spread  of  viruses  because  of  the  large  domain  in  which  the  programs  operate.  On  a 
typical  ACL  system  a  program  being  executed  by  a  user  has  the  same  rights  as  that  user. 
Thus,  a  program  not  owned  by  but  executed  by  a  user  can  spread  a  virus  to  the  user's 
files.  Even  ACL  systems  designed  for  security  can  allow  viruses  to  spread  [COH84]. 

A  virus  filter  is  a  program  that  takes  a  suspect  program  and  determines  if  the  suspect 
program  contains  a  virus.    Deciding  whether  a  program  contains  a  virus  is  equivalent  to 
the  Halting  Problem  [COH84].  Therefore,  writing  an  all  encompassing  virus  filter  is  im- 
possible. 
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It  is  possible  to  write  a  virus  filter  program  to  determine  if  a  particular  bit  pattern  indica- 
tive of  a  certain  virus  exists  in  a  given  program.  All  current  virus  filters  work  using  this 
method.  The  drawback  to  this  method  is  that  the  bit  pattern  of  the  virus  must  be  known 
in  advance.  These  simple  filters  will  not  detect  any  new  viruses  or  any  old  viruses  that 
have  mutated. 

A  different  type  of  virus  filter  would  be  able  to  separate  programs  into  three  classes: 
those  programs  that  contained  viruses,  those  programs  that  do  not  contain  viruses,  and 
those  programs  that  the  filter  is  not  sure  if  the  program  does  or  does  not  contain  a  virus. 
Programs  that  may  have  a  virus  would  then  need  to  be  examined  by  other  methods.  Such 
work  will  probably  be  system  dependent  and  is  at  least  five  to  ten  years  away. 

By  recording  the  state  of  the  file  system  and  examining  these  "snap  shots"  in  conjunction 
with  auditing  records,  it  is  possible  to  tell  if  files  are  being  modified  without  permission. 
Such  techniques  can  be  used  for  virus  identification  after  detection,  but  are  currently  not 
feasible  for  virus  detection. 

The  runtime  models  for  virus  detection  are  Program  Flow  monitors  and  N- Version 
programming.  [JOS88]  A  program  can  be  uniquely  determined  by  program  trace  infor- 
mation as  it  executes.  The  trace  information  is  generated  at  compile  time  and  checked 
against  the  executing  program  by  a  program  flow  monitor.  In  order  for  this  method  to 
work  it  requires  a  change  in  compiler  design  in  order  to  calculate  this  trace  information. 
There  is  also  a  significant  runtime  overhead. 
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N- Version  Programming  consists  of  executing  several  copies  of  a  program  simultane- 
ously and  followed  by  comparison  of  the  outputs.  This  method  will  detect  a  virus  if  a 
virus  has  been  inserted  into  some  but  not  all  of  the  copies  of  the  program.  This  will  not 
protect  against  fast  spreading  viruses  where  all  of  the  copies  of  the  program  are  infected. 
There  is  also  a  corresponding  increase  in  overhead  when  compared  to  a  single  execution 
of  the  program. 

Encrypting  all  files  and  only  decrypting  on  need  with  a  password  unavailable  to  viruses 
will  stop  the  spread  of  a  virus.  When  an  infected  file  is  decrypted,  the  original  program 
will  be  changed  most  likely  causing  a  loss  of  functionality  (especially  when  using  cipher 
block  chaining,  see  section  2.3.3.2).  For  frequendy  executed  programs  encryption  will 
involve  a  significant  increase  in  overhead  due  to  the  computational  complexity  of  encryp- 
tion techniques. 

1.5  Problem  Statement 

The  Clark  &  Wilson  model  appears  to  be  the  most  promising  approach  to  maintaining 
integrity  in  the  commercial  world.  The  area  of  the  Clark  &  Wilson  model  to  be  used  in 
this  thesis  is  the  prevention  of  change  of  a  file  as  described  by  Clark  and  Wilson. 

The  United  States  Department  of  Defense  has  published  a  criteria  for  rating  systems  in 
regard  to  confidentiality  in  the  Trusted  Computer  System  Evaluation  Criteria  (TCSEC) 
[DOD85]  .  This  document  is  commonly  known  as  the  Orange  Book.  The  TCSEC 
provides  seven  levels  of  ratings  for  the  ability  of  systems  to  maintain  confidentiality. 
The  ratings  range  from  A-1,  which  is  a  verified  design,  through  D  which  provides  mini- 
mal protection.  Though  confidentiality  does  not  automatically  translate  into  integrity. 
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there  are  many  common  features.  Particularly,  confidentiality  does  not  protect  against 
viruses  [COH84]. 

The  access  control  implemented  on  most  systems  provides  weaker  protection  than  the 
protection  in  the  Orange  Book  rating  of  A  or  B.  This  gap  between  practice  and  the 
standard  provides  many  opponunities  for  the  hidden  destruction  of  integrity.  The  number 
of  commercial  systems  far  outnumber  the  number  of  highly  secure  military  and  national 
systems  and  to  date  there  has  been  less  concern  with  the  commercial  system.  This  thesis 
concentrates  on  the  commercial  systems.  Most  of  the  commercial  systems  have  some 
form  of  limited  access  control.     An  additional  problem  with  commercial  access  control 
is  that  security  was  not  considered  a  highly  valued  design  criteria,  thus  the  implementa- 
tion of  security  tends  to  be  less  than  desirable.  To  maintain  reasonable  confidence  that 
integrity  is  maintained,  both  of  these  problems  must  be  solved.  In  the  foreseeable  future, 
access  control  does  not  offer  an  adequate  method  to  provide  prevention  change  protection 
for  commercial  systems. 

The  threat  to  preventing  change  in  data  items  is  that  an  attacker  can  change  the  data 
without  the  user  knowing  it  has  been  changed.  When  such  a  switch  has  occurred  the  user 
will  believe  the  data  has  integrity  when  it  actually  does  not.  This  deception  occurs  when 
the  original  set  of  data  has  the  same  checksum  as  the  changed  (or  new)  set  of  data  in- 
serted by  the  attacker.  The  attacker  can  either  have  legitimate  access  to  change  the  data 
(but  wishes  to  disguise  the  fact  the  data  has  been  changed)  or  the  attacker  can  be  a  third 
party  who  wishes  to  insert  the  forged  data.  The  case  of  an  attacker  having  legitimate 
access  to  change  the  data  is  known  as  a  "Birthday  Attack"  which  is  described  in  section 
1.3.3.2.  This  thesis  is  only  concerned  with  cases  where  the  attacker  does  not  have  legiti- 
mate access  to  change  the  data,  i.e.  a  third  party  attack. 
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In  the  field  of  cryptography  it  is  assumed  that  the  attacker  has  unlimited  current  state  of 
the  art  resources  to  employ  against  the  encryption.  This  thesis  does  not  make  that  as- 
sumption. Instead,  it  makes  the  assumption  that  reasonable  resources  will  be  expended  to 
discover  a  set  of  data  that  advances  the  purpose  of  the  attacker  and  produces  the  same 
checksum.  The  use  of  unlimited  resources  is  unreasonable  economically  and,  in  addition, 
would  call  attention  to  the  attack  and  trigger  appropriate  action  to  be  taken  by  system 
authorities. 

The  problem  this  thesis  will  solve  involves  the  testing  of  checksumming  methods  for  use 
as  deterrents  to  the  integrity  threat  posed  by  viruses.  General  methods  for  constrution 
and  testing  will  be  developed  along  with  developing  checksum  algorithms  secure  against 
viruses.  The  checksum  algorithms  to  be  used  are  variations  of  the  QCMDCV4  [JUE86] 
algorithm.  This  algorithm  and  the  modifications  to  it  created  for  this  thesis  will  be  dis- 
cussed in  Chapter  2.  These  algorithms  will  be  tested  on  DEC  VAX  1 1/780,  AT&T  3B2, 
and  Harris  HCX-9  systems  and  used  to  calculate  checksums  on  a  relatively  large  number 
of  programs.  The  results  of  the  checksumming  will  be  analyzed  to  discover  the  effi- 
ciency and  effectiveness  of  such  methods.  An  implementation  of  one  of  these  algorithms 
will  be  demonstrated  using  the  MINIX  operating  system. 

This  remainder  of  this  thesis  is  organized  as  follows: 

Chapter  2.  Error  Detection  with  Checksums. 

Chapter  3.  Testing  of  checksum  algorithms. 

Chapter  4.  Implementation  considerations. 

Chapter  5.  Conclusions  and  further  research  suggestions. 
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Chapter  2       Error  Detection  with  Checksums 

This  chapter  discusses  the  protection  provided  against  errors  in  general  and  against 
forgeries  and  viruses  in  particular  of  checksum  algorithms.  The  discussion  includes  a 
general  description  of  checksums,  features  of  checksum  algorithms  including  those 
providing  protection  against  forgery  and  viruses,  and  methods  of  constructing  checksum 
algorithms. 

A  checksum,  or  digital  signature,  is  any  fixed  length  block  functionally  dependent  on 
every  bit  of  the  message,  so  that  different  messages  will  have  different  checksums  with  a 
high  probability  [DEN82].  Checksums  are  used  to  detect  changes  or  errors  in  messages 
or  sets  of  data  between  the  current  time  and  the  time  they  were  created.  A  checksum  on  a 
set  of  data  is  generated  by  a  checksum  algorithm.  Examples  of  checksum  algorithms 
include  Cyclic  Redundancy  Codes  (CRC)  used  in  networks  and  cipher  block  chaining 
using  the  Data  Encryption  Standard  (DES), 

2.1  Features  Required  of  General  Checksum  Algorithms 

Good  general  checksum  algorithms,  in  order  to  detect  errors,  produce  checksums  which 
have  the  features  of  even  mapping,  overdeterminism,  and  permutation  sensitivity.  These 
features  are  necessary  in  order  to  detect  errors  introduced  in  a  set  of  data.  [JUE86] 

2.1.1  Even  Mapping 

Even  mapping  refers  to  the  uniformity  of  the  distribution  of  checksums  generated  by  a 
given  population  of  programs.  The  even  mapping  of  sets  of  data  to  checksums  exists  if 
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the  probability  of  generating  any  given  checksum  is  approximately  equivalent  to  the 
probability  of  generating  any  other  checksum  over  the  set  of  all  possible  programs  to  be 
checksummed.  One  of  the  goals  of  a  checksum  algorithm  is  that  given  two  sets  of  data  A 
and  B  with  checksums,  it  is  desired  that  the  checksum  of  A  and  the  checksum  of  B  be 
identical  if  and  only  if  the  sets  of  data  A  and  B  are  themselves  identical.  [JUE86]    Since 
there  is  many-to-one  mapping  from  sets  of  data  to  checksums  (the  sets  of  data  can  be  any 
length  while  the  checksum  is  a  fixed  length  block,  and  thus  there  are  many  sets  of  data 
for  every  checksum)  the  probability  of  two  sets  of  data  having  the  same  checksum  should 
not  be  significantly  different  than  2-  where  n  is  the  number  of  bits  in  the  checksum.     A 
checksum  algorithm  which  exhibits  even  mapping  allows  on  the  average  ln(2)*  2"-*  sets 
of  data  that  have  errors  or  changes  to  occur  before  a  set  of  data  that  is  in  error  or  has  been 
changed  is  judged  not  to  have  an  error  or  not  to  have  been  changed  (probability  of  2") . 

2.1.2  Overdeterminism 

An  overdetermined  checksum  algorithm  is  an  algorithm  where  the  resultant  checksum  is 
a  function  of  all  the  bits  of  the  set  of  data  being  checksummed.  If  a  checksum  algorithm 
does  not  provide  this  overdetermanism  then  errors  that  occur  in  bits  that  do  not  affect  the 
checksum  would  not  be  detected  by  the  checksum.  Overdeterminism  in  a  checksum 
algorithm  is  crucial  if  errors  are  to  be  detected  as  dictated  by  the  even  mapping  feature. 

2.1.3  Permutation  Sensitive 

A  checksum  algorithm  is  permutation  sensitive  if  it  produces  different  checksums  for 
each  permutation  of  the  data  elements.    The  permutation  sensitive  checksum  algorithm 
operating  on  a  set  of  data  ABC  produces  a  different  checksum  than  the  algorithm  operat- 
ing on  permutations  of  that  data,  i.e.,  ACB,  BAC,  BCA,  CAB  or  CBA. 
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2.2  Forgery 

General  checksum  algorithms  are  designed  to  detect  errors  or  bursts  of  errors  that  occur 
on  a  random  basis.  If  an  attacker  knows  the  general  checksum  algorithm,  it  is  relatively 
easy  to  surreptitiously  insert  a  different  set  of  data  (a  forgery)  which,  when  using  the 
same  checksum  algorithm,  generates  the  same  checksum.  The  two  factors  that  increase 
the  protection  level  of  checksum  algorithms  against  forgeries  in  general  and  viruses  in 
particular  are  the  length  of  checksum  and  the  difficulty  of  inversion  of  the  checksum 
algorithm  (i.e.  not  having  trap  doors). 

2.2.1  Length  of  the  Checksum 

The  length  of  a  checksum  is  defined  as  its  length  in  bits.  The  checksum  should  be  of 
sufficient  length  such  that  the  cost  of  generating  enough  variations  to  find  a  suitable 
forgery  (brute  force  attack)  is  unacceptably  high.  On  the  average  the  generation  of  2"' 
variations  is  necessary  to  produce  a  set  of  data  for  forgery.  [JUE86]    The  length  of  the 
checksum  is  the  primary  deterrent  to  brute  force  attacks. 

2.2.2  Nonlnvertable  Algorithms 

A  nonlnvertable  algorithm,  described  in  section  1.3.3.3,  is  a  function  that  cannot  be 
inverted.  Thus,  given  a  checksum  and  the  checksum  algorithm,  an  attacker  cannot  gener- 
ate an  algorithm  that  produces  sets  of  data  that,  when  taken  as  the  argument  of  the  check- 
sum algorithm,  result  in  the  original  checksum.  If  a  checksum  algorithm  cannot  be 
inverted  then  it  has  no  trap  doors  and  is  not  susceptible  to  a  trap  door  attack. 

2.3  Construction 

The  general  techniques  used  to  construct  checksums  are  similar  to  those  used  in  con- 
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structing  ciphertext.  The  techniques,  described  below,  are  substitution,  transposition,  and 
feedback. 

2.3.1  Substitution 

Substitution  involves  replacing  one  block  of  data  of  the  plaintext  with  a  corresponding 
block  from  the  ciphertext  alphabet.  If  the  message  is  in  the  plaintext  alphabet  {a^,,  a,, ..., 
a^J  then  the  corresponding  ciphertext  alphabet  is  (fCag),  fCa^), ...,  f(a^.i)},  where  f()  is  a 
one-to-one  mapping  from  plaintext  blocks  to  ciphertext  blocks.  A  simple  example  of 
substitution  is  to  exclusive-or  a  constant  to  each  character  of  the  plaintext  message  to 
arrive  at  its  ciphertext  equivalent. 

2.3.2  Transposition 

Transposition  is  the  rearranging  of  bits  or  characters  according  to  some  scheme.     Trans- 
position was  classically  done  with  aid  of  some  type  of  geometric  figure.  [DEN82]  An 
example  given  in  Denning  is  the  permutation  of  the  characters  of  the  plaintext  with  a 
fixed  period  d.  A  plaintext  message  M=mj ...  m^  j  m^  m^^j ...  m^^  ...  is  transposed  into  the 
ciphertext  message  m^^^^ ...  m^^^^  m^^^^j, ...  m^^^^^^ ... .    For  example,  suppose  for  d=4  the 
permutation  is  [DEN82]: 

i:  1  2  3  4 

f(i):      2  4  13 

and  for  message  M=RENA  ISSA  NCE 

and  the  transposition:  EARN  SAIS  CNE 
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2.3.3  Feedback 

Feedback  is  the  use  of  previous  information  in  the  computation  of  the  ciphertext  of  the 
current  block.  This  feedback  mechanism  can  be  expressed  as  Y.=f(g(X.),Y.  ,,Y.  ,...,Y  ) 
where  Y  is  the  ciphertext  for  block  i,  g()  is  the  encryption  function,  f()  is  the  feedback 
function,  X.  is  die  plaintext  for  block  i,  and  Y^  is  an  initialization  vector.  Since  the 
ciphertext  of  die  last  block  contains  information  on  all  die  previous  blocks,  the  last  block 
can  be  used  as  die  checksum.  The  two  most  prevalent  methods  using  feedback  are 
Cipher  Feedback  Mode  and  Cipher  Block  Chaining  which  are  discussed  below  along 
with  non-linear  feedback. 

2.3.3.1  Cipher  Feedback  Mode 

In  Cipher  Feedback  (CFB)  mode,  ciphenext  is  fed  back  into  die  algorithm  to  generate  a 
cryptographic  bit  stream,  Y.  A  bit  stream,  Y^,  is  used  initially  until  there  is  cipher  text  to 
combine  widi  die  plaintext  bitstream.  Y  is  a  function  of  k  previous  bits  of  die  output.  To 
obtain  the  ciphertext,  Y,  die  plaintext  X.  is  added  modulo  2  to  Y.^.  [JUE83]  i.e.  Y.  =  X.  '^ 
Y.  j^.  The  bit  stream  Yi  may 
be  shifted  and  then  encrypted 
to  enhance  the  security  of  the 
resultant  bit  stream. 


Cipher  Feedback  Mode  (CFB) 
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2.3.3.2  Cipher  Block  Chaining 

In  the  Cipher  Block  Chaining  (CBC)  mode  of  operation  successive  blocks  of  ciphertext 
are  defined  as:  Y.=f(X.  ^  Y.  j)  where  Y^  is  the  initializing  vector  and  ^  indicates  bit-by-bit 
modulo  2  addition  (exclusive-or). 

Cipher  block  chaining  is  more  efficient  than  CFB  in  that  it  uses  a  single  execution  of  the 
block  encryption  algorithm  for  each  block. 


X. 
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Cipher  Block  Chaining  (CBC) 


2.3.3.3  Non-linear  Feedback 

A  nonlinear  function  is  a  function,  f(),  where  x/f(x)  is  not  equal  to  a  constant.  A  feedback 
function,  f(X.,Y.  j,Y.  2...,Y^=Y.,  which  is  nonlinear  provides  positional  dependance 
[JUE83],  i.e.  permutation  protection.  Using  non-linear  feedback  provides,  but  is  not 
always  sufficient,  a  metiiod  for  constructing  a  noninvertable  checksum  algorithm.  An  ex- 
ample of  a  non-linear  feedback  function  is:  Y.=(X.+Y.^)^  modulo  N,  where  N  is  a  con- 
stant. The  checksum  is  the  ciphertext  of  the  last  block,  Yn.  Note  tiiat  the  addition  of 
non-linear  feedback  produces  a  dependency  of  the  checksum  on  every  bit  of  the  plaintext. 
[JUE83] 
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2.4  Algorithms 

Using  the  methods  of  construction  discussed  in  section  2.3  several  checksum  algorithms 
are  presented  in  this  section.  These  checksum  algorithms  fall  into  two  categories: 
cryptographic  and  noncryptographic.  Cryptographic  algorithms  require  additional  infor- 
mation, in  the  form  of  a  key,  to  determine  the  checksum  algorithm. 

2.4.1  Cryptographic  Algorithms 

A  cryptographic  algorithm  is  an  algorithm  which  requires  additional  information  (known 
as  a  key)  to  be  used  for  encryption.  The  role  of  keys  will  be  discussed  below  along  with 
examples  of  different  types  of  encryption. 

2.4.1.1  Keys 

The  function  of  the  key  is  to  hide  the  exact  algorithm  used  from  attackers.  Hiding  the 
algoritiim  allows  the  results  of  Uie  cryptographic  algorithm  (either  the  ciphenext  or  the 
checksum)  to  be  stored  in  a  location  which  can  be  modified  by  an  attacker.  The  attacker 
concentrates  his/her  efforts  on  determining  the  key  to  die  algorithm,  since,  if  the  key  is 
known,  then  any  set  of  data  and  its  checksum  can  be  used  as  a  forgery.  In  this  scenario 
die  legitimate  user  of  die  data  generates  a  checksum  with  die  cryptographic  algoritiim  and 
compares  it  to  die  stored  checksum  Gocated  in  a  modifiable  location).  Since  the  check- 
sum of  the  forgery  matches  die  stored  checksum  the  legitimate  user  accepts  the  forgery  as 
valid. 

Cryptographic  algoridims  must  protect  die  identity  of  die  key  even  when  an  attacker 
knows  die  general  cryptographic  algoritiim  and  has  a  copy  of  plaintext  and  its  corre- 
sponding ciphertext  (plaintext  attack).  Thus,  die  function  h(p,c)=  k,  where  p  is  the 
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plaintext,  c  is  the  ciphertext,  and  k  is  the  key  should  be  computationly  hard  to  determine. 
The  fact  that  the  inversion  function,  h(),  needs  to  be  computationly  hard  to  determine, 
forces  cryptographic  algorithms  to  be  computationly  complex  and  time  consuming  to 
execute. 

The  alternative  to  storing  the  cryptographic  generated  checksum  witii  a  hidden  key  in  a 
modifiable  location  is  to  use  a  cryptographic  algorithm  with  a  hidden  key  and  store  the 
checksum  in  a  location  diat  is  not  modifiable  by  the  attacker.  The  attacker  then  would 
need  to  determine  the  key  before  attempting  any  of  the  attacks  described  in  section  1.3.3  . 
This  method  is  more  secure  for  identifying  changes  in  a  set  of  data  than  using  the  power 
of  cryptography  alone. 

2.4.1.2  Examples  of  Cryptographic  Algorithms 

The  Rivest,  Shamir,  Adleman  (RSA)  [RIV79]  cryptographic  algorithm  is  a  substitution 
cipher  based  on  computing  exponentials  over  a  finite  field.  The  RSA  algorithm  with 
cipher  block  chaining  can  be  used  as  a  checksum  algorithm.  The  RSA  algorithm  is  a 
patented  public  key  encryption  method  based  on  the  difficulty  of  factoring  large  numbers. 
The  method  has  the  property  such  that  C=M'=  mod  n    and  M=D<'  mod  n  with  the  prop- 
erty that  ed  mod  phi(n)  =  1,  where  M  is  the  message,  C  is  the  ciphertext,  e  and  d  are  the 
keys,  n  is  a  large  prime  number  dependent  on  e  and  d,  and  phi(n)  is  the  Euler  totient 
function. [RIS79]  The  Euler  totient  function,  phi(n),  is  the  number  of  elements  in  the 
reduced  set  of  residues  modulo  n.  Equivalentiy,  phi(n)  is  the  number  of  positive  integers 
less  than  n  that  are  relatively  prime  to  n.  [DEN82] 

RSA  with  large  keys  is  very  secure;  key  lengths  of  over  1 10  digits  can  be  considered 

secure  at  this  point  in  time.  Using  a  plaintext  attack  the  computations  are  on  the  order  of 
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exp(sqrt  ( In  (n)  In  (In  (n)))).  [DEN82]  Since  computational  complexity  is  of  the  order 
of  O(n^),  key  length  is  crucial  to  both  computational  intensity  and  security.  [USE89] 

The  disadvantage  of  using  RSA  as  a  checksum  algorithm  is  the  large  computational 
intensity  of  calculating  a  checksum.  This  computational  complexity  precludes  its  use  for 
checksums  on  all  but  the  fastest  computers. 

Cohen  [COH88]  suggested  a  method  to  reduce  the  computational  complexity  of  using 
RSA  for  checksums.  Instead  of  encrypting  each  block  of  data,  Cohen  suggested  first 
breaking  the  data  into  larger  fixed  size  segments.  Each  segment  is  reduced  in  size  by 
using  modulo  division  with  a  large  prime.  RSA  with  cipher  block  chaining  is  then 
applied  to  the  reduced  segments.  The  last  block  of  ciphenext  is  used  as  the  checksum. 
This  method  reduces  the  computational  complexity  to  the  computation  complexity  of 
RSA  for  creating  ciphertext  because  fewer  RSA  block  encryptions  are  necessary. 

Cohen's  original  method  illustrates  the  difficulty  of  detecting  and  preventing  trap  doors. 
In  some  instances,  the  checksum  did  not  depend  on  certain  parts  of  the  file  making  it 
possible  to  determine  a  set  of  programs  that  had  the  same  checksum  [COH88].  Cohen 
has  subsequently  published  a  revised  algorithm  which  corrects  this  problem.  [COH88] 

The  data  encryption  standard  (DES)  was  the  official  scheme  approved  by  the  National 
Bureau  of  Standards  [NBS78]  in  1978  to  be  used  by  federal  departments  and  agencies  for 
the  cryptographic  protection  of  unclassified  computer  data.  The  DES  uses  a  block  cipher 
method  that  includes  a  product  cipher  on  each  individual  block.  Formally,  the  DES 
encryption  may  be  described  as  a  product  cipher 
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DES  =  (II>')J.6-J,(IP) 
performed  on  each  64  bit  block  P  of  plaintext.  IP  is  the  bit- wise  permutation  with  inverse 
IR'.  The  64  bit  result  of  the  permutation  is  expressed  as  the  concatenation  of  two  32  bit 
halves: 

IP(P)  =  LoRo 

X(L.  J  R.  j)  for  1  <=  i  <=  16  is  defined  as: 

R  =  L,Af(R^,K) 
where  K.  is  derived  from  the  secret  56  bit  K,  or  private  key. 
The  ciphertext  is  given  by  C  =  IP  iCRjg  L  J  =  DES(P) 

The  source  of  security  derives  from  the  nonlinear  many-to-one  function  f,  which  is 
applied  to  the  R.  half  blocks.  Transposition  and  substitution  are  the  main  internal  compo- 
nents of  f.  [MAE87]    DES  can  be  used  with  cipher  block  chaining  or  cipher  feedback 
mode  as  a  checksum  algorithm. 

One  advantage  of  using  the  DES  encryption  scheme  for  generating  checksums  is  that  the 
DES  algorithm  is  available  as  a  chip  which  can  be  incorporated  in  the  computer.  If 
encryptions  are  generated  using  the  DES  algorithm  implemented  with  software  the 
process  is  time  consuming  because  the  DES  algorithm  is  computationally  intensive. 

2.4.2  Noncryptographic  Checksum  Algorithms 

A  noncryptographic  checksum  algorithm  is  a  checksum  function  which  does  not  require 
additional  information  in  the  form  of  a  key.  Using  the  methods  of  substitution,  transposi- 
tion and  feedback  described  in  section  2.3,  noncryptographic  checksum  algorithms  are 
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generated.  Examples  of  checksum  algorithms  will  be  examined  to  show  the  specific  op- 
erations that  can  be  used  in  checksum  algorithms.  Note  that  some  of  these  algorithms 
were  not  meant  to  be  used  for  active  forgery,  or  if  so,  appended  to  the  end  of  a  set  of  data 
with  the  resultant  set  encrypted.  The  noncryptographic  checksum  algorithms  examined 
range  from  the  simple  X-OR  and  K-bit  Linear  Addition  to  the  moderately  complex 
Cyclical  Redundancy  Checksum  and  finally  the  more  complex  Quadratic  Congruential 
Manipulation  Detection  Code  (QCMDC)  and  its  variations. 

X-OR  Checksum.  This  is  a  simple  checksum  algorithm  technique  which  involves 
exclusive-oring  the  blocks  of  a  message  togetiier:  Y=Xj  '^  X^  '^ ...  ^  X_^  where  X  is  the 
blocks  of  the  message.  This  X-OR  checksum  algorithm  was  initially  proposed  by  the 
National  Bureau  of  Standards  and  was  in  the  original  draft  of  Federal  Standard  1026. 
[JUE83]  The  exclusive-or  mechanism  is  die  feedback  mechanism  to  insure  that  the 
checksum  is  dependent  on  all  the  bits  of  the  original  data.  This  simple  checksum  is  very 
susceptible  to  attacks  such  as  inserting  the  same  block  of  data  twice  while  keeping  the 
rest  of  die  message  die  same  ( X  '^  X  '^  Y  =  Y).  Additionally,  blocks  of  data  can  be  trans- 
posed without  detection.  Even  if  this  simple  checksum  is  added  to  a  message  which  is 
then  encrypted  by  DES  widi  eidier  Cipher  Feed  Back  (CFB)  or  Cipher  Block  Chaining 
(CBC),  manipulation  detection  is  still  not  provided,  even  if  die  key  is  not  known 
[JUE83]. 

K-bit  Linear  Addition.  In  Uiis  type  of  algorithm  the  blocks  of  data  are  linearly  added 
modulo  2"  :  Y=(X^+X2+  ...  -i-X_^)  mod  2"  where  Y  is  the  resultant  checksum,  k  is  a  con- 
stant, and  X.  are  die  blocks  of  data.  [MEY82]  To  forge  a  checksum,  an  attacker  inserts 
the  desired  blocks  while  changing  or  reducing  odier  blocks  to  match  the  proper  check- 
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sum.  The  K-bit  Linear  Addition  algorithm  was  designed  to  be  used  in  the  same  manner 
as  the  X-or  algorithm,  i.e.,  a  checksum  generated  and  then  the  entire  message  encrypted. 
The  K-bit  Linear  Addition  algorithm  provides  more  protection  than  the  X-or  algorithm, 
but  it  does  not  provide  acceptable  protection  against  manipulation  of  the  data,  especially 
transposition  of  blocks.  [JUE83] 

Cyclical  Redundancy  Checksum  (CRC).  This  method  includes  a  set  of  checksum  algo- 
rithms which  are  widely  used  in  detecting  errors  in  messages  passed  over  a  network  and 
implemented  in  hardware  for  efficiency  considerations.  A  basic  description  of  the  proc- 
ess of  CRC  is  that  a  polynomial  of  order  n  is  chosen:  f(x)=  cn*x''-i-  c    *x""'  -i-  c    *x"'^  +  ... 

n-1  n-2 

+  c*x^+  Co*x°  where  c.  is  eitiier  0  or  1.  The  checksum  is  the  block  of  data,  n  bits  long, 
that  must  be  concatenated  to  the  right  hand  side  of  a  set  of  data  to  be  checksummed  such 
that  the  combined  set  of  data  and  checksum  when  divided,  modulo  two,  by  the  chosen 
polynomial  gives  a  remainder  of  zero.  The  choice  of  polynomial  is  important  in  detect- 
ing errors.  For  example,  if  die  polynomial  can  be  factored  by  (x-1)  then  the  checksum 
will  detect  all  error  cases  where  tiiere  exists  an  odd  number  of  errors.  [TANS  8]  Typical 
polynomial  examples  include  CRC- 16:  f(x)=x"*+x'*  +1,  CRC- 12:  f(x)=x'2+x"-i-x3+x2+x'+l 
[TAN88]. 

Quadratic  Congruential  Manipulation  Detection  Code  (QCMDC)  [JUE83]  This  al- 
gorithm is  an  example  of  die  use  of  nonlinear  feedback.    The  QCMDC  algorithm  is 
Y.=(X.  -I-  Y.  j)2  mod  N  witii  Y^  an  initial  seed  and  N  a  large  prime  number.    Nonlinearity 
is  introduced  by  die  squaring.  The  modular  arithmetic  allows  the  precision  to  be  speci- 
fied in  advance.  The  QCMDC  algorithm  has  a  trap  door  in  diat  it  is  possible  to  insert  the 
desired  blocks  and  calculate  counterbalancing  blocks  to  add  in  order  to  maintain  the  same 
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checksum.  For  example,  to  insert  block  j  between  blocks  i  and  i+1  it  is  necessary  to 
determine  X^  such  that  Y.=(X.  +  Y.f  mod  N  and  Y.=Y|^=(X^  +Y.y  mod  N.  The  non- 
linearity  makes  it  more  difficult  to  calculate  the  additional  blocks  to  insert  into  the  set  of 
data  than  either  the  X-OR  checksum  or  the  K-bit  Linear  Addition  checksum. 

MDC2.  This  algorithm,  which  I  created,  is  a  variation  of  the  QCMDC  checksum  algo- 
rithm. It  consistes  of  a  simple  combination  of  exclusive-or  (^),  modulo  division  (mod), 
squaring  (**2),  addition  (+),  and  subtraction  (-)  and  transposition  of  data  in  a  two  equa- 
tion format.  The  substitution  on  a  byte  level  is  provided  by  the  exclusive-or  and  the 
addition,  transposition  is  provided  by  the  changing  of  the  order  of  the  two  data  terms 
between  the  two  equations,  nonlinearity  is  introduced  by  the  squaring  operation  followed 
by  the  modulo  division,  and  feedback  is  provided  by  the  two  equations  depending  on  the 
results  of  the  previous  equations  for  the  byte  level  substitution.  The  use  of  two  equations 
reduces  the  ability  to  determine  a  successful  trap  door  attack  because  both  equations  must 
be  satisfied  before  a  forgery  can  be  found. 

Pseudo  Code  for  MDC2: 

Nl  =  large  prime  a 
N2  =  large  prime  b 
Ml  =  large  prime  c 
M2  =  large  prime  d 
While  data  in  file 

read  first  block  of  data  into  Tl 

read  second  block  of  data  into  T2 

Ml  =  ((Ml^Tl  +  M2'^T2)**2)  mod  Nl 

M2  =  ((MZ'^Tl  -I-  M1^T2)**2)  mod  N2 
Endwhile 

Checksum=  Ml  concatenated  with  M2 
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MDC4.  This  algorithm,  which  I  created,  is  a  variation  of  MDC2  with  increased  feedback 
mechanisms.    To  reduce  the  possibility  of  construction  of  trap  door  attacks  the  MDC4 
checksum  algorithm  uses  four  equations  with  four  block  level  substitutions.  The  use  of 
additional  interrelated  terms  between  the  four  equations  increases  the  difficulty  of  finding 
a  function  that  will  generate  an  executable  file  from  a  checksum. 

Pseudo  Code  for  MDC4: 

Nl       =  large  prime  a 

N2       =  large  prime  b 

N3       =  large  prime  c 

N4       =  large  prime  d 

Ml       =  large  prime  e 

M2       =  large  prime  f 

M3       =  large  prime  g 

M4       =  large  prime  h 

While  data  in  file 

read  first  block  of  data  into  Tl 

read  second  block  of  data  into  T2 

read  third  block  of  data  into  T3 

read  fourth  block  of  data  into  T4 

Ml  =  ((Ml'^Tl  +  M1^T2  -  M3^T3  +  M4'^T4)**2)  mod  Nl 

M2  =  ((Ml'^Tl  -  M3'^T2  +  M4'^T3  -  M1'^T4)**2)  mod  N2 

M3  =  ((M3'^T1  +  M4^T2  -  M1^T3  +  M2'^T4)**2)  mod  N3 

M4  =  ((M4^T1  -  M1^T2  +  M2'^T3  -  M3'^T4)**2)  mod  N4 

Endwhile 

Checksum=  Ml  concatenated  with  M2  concatenated  with  M3  concatenated  with  M4 

MDC2T.  This  algorithm,  which  I  created,  is  a  variation  of  MCD2  witii  additional  feed- 
back mechanisms  to  defeat  trap  door  attacks.    The  checksum  algorithm  MDC2T  employs 
an  additional  substitution  witii  feedback  at  the  byte  level.  A  term,  tss,  is  formed  by  con- 
catenating half  of  tiie  first  data  block  with  half  of  the  second  data  block.  This  term  is 
used  as  a  feedback  mechanism  for  substitution  at  the  block  level.    This  additional  feed- 
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back  makes  the  task  of  determining  trap  doors  more  difficult. 


Pseudo  Code  for  MDC2T: 

Nl       =  large  prime  a 

N2       =  large  prime  b 

Ml       =  large  prime  c 

M2       =  large  prime  d 

While  data  in  file 

read  first  block  of  data  into  Tl 
read  second  block  of  data  into  T2 
TSS=  MSH  of  Tl  ored  with  MSH  of  T2 
Ml  =  ((Ml'^Tl  +  M2'^T2)**2+TSS)  mod  Nl 
M2  =  ((Ml'^Tl  +  M1'^T2)**2-TSS)  mod  N2 

Endwhile 

Checksum=  Ml  concatenated  with  M2 


MDC4T  algorithm.  This  is  a  generalized  version  of  the  QCMDCV4  algorithm  suggested 
by  Juenman  to  improve  upon  the  QCMDC  algorithm.  [JUE86]  The  true  QCMDCV4  al- 
gorithm uses  32  bit  blocks  resulting  in  a  128  bit  checksum  and  set  values  of  the  primes 
and  initial  seeds.  The  MDC4T  checksum  algorithm  has  the  general  form  of  the 
QCMDCV4  algorithm  but  can  be  used  witii  shorter  block  lengths  to  facilitate  efficient 
computation.  In  order  to  introduce  additional  non-linearity,  substitution  was  added  to  the 
QCMDC  algorithm  which  only  uses  feedback.  The  substitution  was  provided  by  exclu- 
sive oring  intermediate  checksum  totals  to  tiie  data  before  use.  To  prevent  trap  doors  a 
transposed  history  function  was  added.  The  result  is  that  tiiere  are  multiple  different 
references  to  previous  blocks  tiiat  would  need  to  be  satisfied  in  order  to  surreptitiously 
insert  blocks  of  data.  The  MDC4T  algorithm  is: 
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Nl       =  large  prime  a 

N2       =  large  prime  b 

N3        =  large  prime  c 

N4       =  large  prime  d 

Ml       =  large  prime  e 

M2       =  large  prime  f 

M3       =  large  prime  g 

M4       =  large  prime  h 

While  data  in  file 

read  first  block  of  data  into  Tl 

read  second  block  of  data  into  T2 

read  third  block  of  data  into  T3 

read  founh  block  of  data  into  T4 

TSS=  MSQ  of  Tl  ored  with  MSQ  of  T2  ored  with  MSQ  of 

T3oredwithMSQofT4 
Ml  =  ((Ml'^Tl  +  M2^T2  -  M3'^T3  +  M4^T4)**2+TSS)  mod  Nl 
M2  =  ((M2'^T1  -  M3^T2  +  M4^T3  -  M1^T4)**2-TSS)  mod  N2 
M3  =  ((M3'^T1  +  M4AT2  -  M1^T3  +  M2^T4)**2+TSS)  mod  N3 
M4  =  ((M4^T1  -  M1^T2  +  M2^T3  -  M3^T4)**2-TSS)  mod  N4 

Endwhile 


Checksum=  Ml  concatenated  with  M2  concatenated  with  M3  concatenated  with  M4 

The  QCMDCV4  algorithm  appears  very  strong  in  terms  of  defeating  forgery  attacks  in 
that  it  provides  noninvertability  and  at  128  bits  is  long  enough  to  defeat  birthday  attacks. 
[JUE86]  The  MDC4T  algorithm  maintains  the  noninvertability  aspect,  but  for  checksum 
lengths  of  less  than  128  bits,  does  not  protect  against  birthday  attacks.  [JUE86] 


2.4.3  Comparison  of  Cryptographic  and  noncryptographic  algorithms 

The  theoretical  difference  between  cryptographic  and  noncryptographic  algorithms  is  that 
with  cryptographic  algorithms  the  attacker  does  not  possess  the  total  algorithm  and  thus 
cannot  perform  the  attacks  described  in  section  1.3.3.  A  drawback  to  cryptographic  algo- 
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rithm  is  that  the  key  must  be  provided  to  the  checksum  algorithm  each  time  the  algorithm 
is  to  be  used. 

A  practical  disadvantage  of  cryptographic  algorithms  is  that  they  are  designed  to  conceal 
the  identity  of  the  key.  This  makes  cryptographic  checksums  very  complex  computation- 
ally, effectively  eliminating  their  use  on  present  microcomputers. 

The  strengths  of  cryptographic  algorithms  are  in  the  substitution  and  transposition  of 
blocks  of  data.  Typically  little  is  provided  in  terms  of  feedback  mechanisms.  Noncrypto- 
graphic  algorithms  generally  provide  little  (compared  to  cryptographic  algorithms)  in 
terms  of  substitution  or  transposition,  but  provide  very  strong  feedback  mechanisms.  For 
example,  DES  encryption  alone  without  feedback  requires,  per  64  bit  block  of  data,  2 
transpositions  each  of  64  bits,  16  transpositions  each  of  32  bits,  16  transpositions  com- 
bined with  substitutions  each  of  48  bits,  16  transpositions  with  substitution  each  of  48 
bits,  16  permutations  each  of  32  bits,  and  16  permutations  each  of  48  bits.  In  contrast, 
MDC4,  using  16  bit  blocks  with  a  64  bit  checksum,  has  32  substitutions  each  of  16  bits, 
and  16  transpositions  each  of  16  bits. 

Since  nonlinear  feedback  is  the  primary  mechanism  to  prevent  trapdoors  and  because  of 
the  large  computational  complexity  of  cryptographic  algorithms,  tiiis  research  has  fo- 
cused on  noncryptographic  algorithms.  The  noncryptographic  algorithms  MDC2, 
MDC2T,  MDC4,  and  MDC4T  were  selected  for  further  study  because  of  their  abiUty  to 
provide  protection  firom  forgery  while  providing  efficient  execution  on  small  computers. 
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2.5  Conclusions 

In  this  chapter  we  have  described  the  features  that  a  checksum  algorithm  must  have  in 
order  to  detect  errors  and  to  defeat  attempted  forgeries  by  an  attacker.  These  features 
include  even  mapping,  permutation  sensitivity,  overdeterminism,  length  and  noninvert- 
ablity.  A  general  basis  for  construction  of  these  checksum  algorithms  was  provided  and 
examples  of  both  cryptographic  and  noncryptographic  algorithms  presented.  Noncrypto- 
graphic  checksum  algorithms  were  shown  to  be  better  for  detection  of  change  in  the  small 
computer  environment  because  of  their  lower  computationally  intensity.  Four  noncrypto- 
graphic algorithms  (MDC2,  MDC2T,  MDC4,  MDC4T)  were  chosen  for  funher  study  and 
testing  in  chapter  3. 
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Chapter  3       Testing  of  Checksum  Algorithms 

This  chapter  describes  methods  of  testing  checksum  algorithms.  These  testing  methods 
are  broken  into  three  areas:  statistical  tests  for  even  mapping,  mutation  tests  for  forgery 
protection,  and  computational  complexity  tests  for  efficiency.    The  checksums  tested 
were  the  MDC2,  MDC2T,  MDC4,  MDC4T  with  resulting  32  bit  checksums. 

3.1  Statistical  Tests 

Statistical  tests  are  used  to  determine  if  a  checksum  algorithm  produces  checksums  that 
map  evenly  over  the  range  of  the  checksum,  i.e.  the  checksums  are  evenly  distributed 
between  the  range  0  and  2" -1,  where  n  is  the  number  of  bits  in  the  checksum.  The  even 
mapping  of  checksums  produced  by  a  checksum  algorithm  is  important  because  of  the 
protection  it  provides  against  brute  force  attacks.  The  method  we  use  to  accomplish  this 
test  is  to  use  the  null  hypothesis  that  the  distribution  of  checksums  from  a  checksum  algo- 
rithm is  an  even  distribution  with  the  alternate  hypothesis  that  the  checksum  distribution 
is  not  evenly  distributed.  The  chapter  is  organized  into  the  following  sections:  descrip- 
tions of  the  statistical  tests,  description  of  the  generation  of  simulated  programs,  and  the 
results  of  the  statistical  tests. 

3.1.1  Description  of  Statistical  Tests 

This  null  hypothesis  is  tested  using  several  statistical  tests  including  Chi-square,  Colli- 
sion, and  Binomial  tests. 

3.1.1.1  Chi-square  Test. 
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The  chi-square  test  which  is  based  on  the  chi-square  statistic  provides  a  measure  of  the 
goodness  of  fit  between  observed  data  and  the  expected  values  of  that  data.  The  chi- 
square  statistic  is  used  to  attempt  to  show  that  the  null  hypothesis,  that  the  checksums 
produced  by  a  checksum  algorithm  are  randomly  distributed,  is  contradicted  by  the  data. 
The  chi-square  statistic  is  also  used  to  determine  the  statistical  significance  of  results  of 
other  statistical  tests.  In  this  instance,  the  chi-square  statistics  employed  to  evaluate  the 
results  from  the  binomial  test. 

Chi-square  (X^)  statistic  is  a  measure  of  the  difference  between  the  observed  value  and 
the  expected  value.  The  chi-square  statistic  is  expressed  as: 

U  =  2w  (observed  -  expected)^  /  expected 

The  statistic,  U,  of  a  chi-square  test  is  examined  to  determine  the  confidence  we  have  in 
the  fit  that  U  describes.  In  order  to  evaluate  the  value  of  U,  the  chi-square  statistic  it  is 
necessary  to  know  the  number  of  degrees  of  freedom.  For  our  applications,  the  number 
of  degrees  of  freedom  is  one  less  than  the  number  of  possible  outcomes. 

For  a  large  number  of  degrees  of  freedom  the  following  values  are  calculated  using  the 
formula  given  in  Knuth  [KNU81]: 

X^  =  v+(2v)^  Xp  +  (2/3)*(xp2  -1)+  OCIM  { 1 } 

where  xp=l%:-2.33,  5%:-1.64,  25%:-.675,  50%:  0,  75%:  .675,  95%:  1.64,  99%:  2.33 

If  one  has  99  degrees  of  freedom  the  results  of  calculating  a  value  for  { 1 }  is: 
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p=.01         p=.05         p=.25         p=.50         p=.75         p=.95         p=.99 
v=99  69.23         77.04         89.14         98.33         108.14       123.23       134.64 

where  p  is  the  probability  that  the  result,  or  a  more  extreme  (unlikely)  result  could  have 
occurred  under  the  null  hypothesis.  If  p  is  small  then  either  an  extreme  (unlikely)  event 
has  been  measured  or  the  null  hypothesis  is  false. 

It  is  desirable  that  there  be  five  or  more  expected  observations  per  category  [KNU81], 
therefore  checksums  are  sorted  into  a  smaller  number  of  distinct  categories.  For  this 
work  a  value  of  100  categories  was  chosen,  resulting  in  99  degrees  of  freedom. 

The  chi-square  statistic  is  excellent  in  examining  the  overall  distribution  of  the  check- 
sums. 

3.1.1.2  Collision  Test 

The  Collision  test  is  applicable  when  the  number  of  possible  outcomes  of  observations  is 
much  larger  than  the  number  of  observations  taken.  For  instance,  suppose  there  are  m 
urns  and  we  throw  n  balls  at  random  into  those  urns,  where  m  is  much  greater  than  n. 
Most  of  the  balls  will  land  in  urns  that  were  previously  empty,  but  if  a  ball  falls  into  an 
urn  that  already  contains  at  least  one  ball  we  say  that  a  "collision"  has  occurred 
[KNU81]. 

If  the  checksums  generated  by  the  checksum  algorithm  map  evenly  over  its  domain  (the 
null  hypothesis)  then  it  should  be  possible  to  predict  the  number  of  collisions  (multiple 
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observations  of  a  checksum).  The  number  of  collisions  is  dependent  on  the  number  of 
observations  taken  (programs  checksummed),  m,  and  a  number  of  possible  values  those 
checksum  can  take,  n  (for  a  32  bit  checksum  n  is  2^^). 

The  probability  that  a  given  possible  checksum  will  contain  exactly  k  observations  is: 

pk=(  I  )m-^  (l-m-O^-"' 
so  the  expected  number  of  collisions  (multiple  observations  of  a  possible  checksum)  is: 

Z^  (k-l)p^  =^  (k*p^  -  S  (p^))  =  n/m-l+Po.  since  Po=(l-m-')"  =  1-n/m  +  ("  )m-^ 
+smaller  terms 

Evaluating  the  equation  shows  that  the  average  total  number  of  collisions  taken  owqv  all 
m  checksums  is  very  slightly  less  than  (n^)/2m  [KNU81].  For  a  32  bit  checksum  and 
512,000  observations  the  expected  number  of  checksum  collisions  is  30.5. 

A  table  of  expected  probabilities  of  c  collisions  occurring  is  the  probability  that  (n-c) 
checksums  are  generated  with  m  tests  and  n  possible  checksums,  i.e.  (m*(m-l)*  ...  *  (m- 
n-i-c+l))/(m**n)  *  {  l^}. 

An  approximation  [KNU81]  of  the  probabilities  for  different  numbers  of  collisions,  c,  are 
shown  below. 

Probability  .99       .94       .71       .44       .24       .05       .01 

Expected  Collisions    43        39        33        29        26        21         1.7 
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3.1.1.4  Binomial  Test 

In  the  null  hypothesis  that  checksums  map  evenly  over  the  interval,  the  number  of  one 
bits  in  checksums  should  follow  a  binomial  distribution.  The  observed  number  of  one 
bits  can  be  compared  to  the  expected  number  based  on  an  even  mapping  of  checksums 
and  the  difference  can  be  tested  for  significance  using  the  chi-square  statistic. 

If  there  is  an  even  mapping,  the  probability  of  any  bit  in  a  checksum  having  a  value  of 
one  is  .50.  Thus  the  expected  probability  distribution  is  given  by  the  formula: 
p(x)  =  (f)(.5)Ml-.5p-^^ 

where  x  is  the  number  of  one  bits  in  the  checksum.  The  observed  versus  the  expected 
results  are  evaluated  for  significance  using  the  chi-square  statistic  with  the  appropriate 
degrees  of  freedom. 

Since  the  expected  value  for  observations  at  the  ends  of  the  scale  is  close  to  zero  the 
number  of  degrees  of  freedom  for  the  chi-square  test  is  reduced.  Chi-square  values  for  26 
and  36  degrees  of  freedom: 

.01       .05       .25       .50       .75        .95       .99 
v=26  12.15    15.30   21.50  26.00   30.50    38.95   45.75 

v=36  19.18   23.21    29.91   36.00   41.36    51.04   58.72 

3.1.2  Simulation  of  Executable  Programs 

In  order  to  provide  a  sufficient  number  of  programs  to  be  able  to  test  the  properties  of  the 
checksum  algorithms  it  was  necessary  to  simulate  a  series  of  executable  programs.  These 
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simulated  executable  programs  were  generated  by  concatenating  a  series  of  random 
numbers.  The  end  of  the  program  was  determined  by  a  specific  terminator  string  chosen 
at  random  thus  providing  programs  of  varying  lengths.  Since  the  addition  of  executable 
statements  at  the  end  of  the  program  constitute  a  new  program,  computation  time  was 
reduced  by  generating  new  programs. 

3.1.3  Results  of  Testing 

This  section  presents  the  results  of  the  statistical  tests,  Chi-square,  Kolmogorv-Smimov, 
Collision  and  Binomial  tests,  when  applied  to  the  four  checksum  algorithms  MDC2, 
MDC2T,  MDC4,  MDC4T. 

3.1.3.1  Chi-square  Test. 

The  chi-square  test  tests  the  whether  to  checksums  generated  by  a  checksum  algorithm 
are  evenly  distributed.  The  checksums  for  512,000  observations  partitioned  into  100 
distinct  equal  sized  categories  are  graphically  displayed  below: 
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Observed  number  of  checksums  per  group,  assuming  even  distribution,  is  5120 
After  segmenting  the  checksums  into  100  even  groups  the  chi-square  and  their  corre- 
sponding p  -  values  for  each  of  the  algorithms  calculated: 


chi-square 

p  -  value 

MDC2: 

106.27 

.70 

MDC2T: 

101.97 

.59 

MDC4: 

11347 

1.00 

MDC4T: 

11419 

1.00 

The  chi-square  values  for  the  MDC2  and  MDC2T  are  not  in  an  acceptable  range  for 
rejecting  the  null  hypothesis  that  the  checksums  are  evenly  distributed.  The  chi-square 
values  for  MDC4  and  MDC4T  clearly  provide  evidence  against  the  null  hypothesis. 
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This  is  also  indicated  by  die  graphical  representation,  especially  at  the  high  end  points  of 
the  interval  and  is  the  circled  areas  on  the  MDC4  and  MDC4T  graphs. 

The  reason  MDC4  and  MDC4T  have  high  chi-square  values  is  because  of  the  prime 
numbers  used  in  the  algorithm.  Because  the  testing  was  done  for  32  bit  checksums,  the 
prime  numbers  used  for  modulo  operation  were  significandy  less  than  2*.  Furthermore, 
in  order  to  reduce  the  probability  of  trap  door  attacks,  different  primes  were  chosen,  even 
further  eliminating  potential  checksum.  In  the  graph  of  distribution  of  checksums,  it  is 
clear  that  there  is  a  significant  decrease  of  observed  checksums  at  the  maximum  possible 
checksum. 

Further  tests  were  conducted  using  64  bit  checksums  to  determine  if  the  choice  of  prime 
numbers  used  in  the  32  bit  MDC4  and  MDC4T  algorithms  were  the  reason  for  the  large 
chi-square  values  or  if  diere  is  an  inherent  flaw  in  those  algorithms.  The  results  for  64  bit 
MDC4  and  MDC4T  checksums  were  determined: 

chi-square  value  p  -  value 

MDC4:  89.1  .25 

MDC4T:  119.6  .90 
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The  values  obtained  for  the  MDC4  and  MDC4T  64  bit  checksum  algorithms  do  not 
reject  the  null  hypothesis. 


3.1.3.2  Collision  Test 

The  collision  test  measures  how  many  times  there  are  multiple  occurrences  of  a  check- 
sum (collisions)  in  a  series  of  observations.  With  512,000  observations  the  following 
results  were  obtained: 


Collisions  Observed 

P  -  value 

MDC2: 

25 

.20 

MDC2T: 

33 

.71 

MDC4: 

33 

.71 

MDC4T: 

38 

.90 

The  collision  test  results  for  all  four  checksum  algorithms  indicate  no  evidence  to  reject 
the  null  hypothesis. 


3.1.3.4  Binomial  Test 

The  binomial  test  tests  to  see  if  each  bit  of  a  checksum  had  a  .50  probability  of  being  a 
one.  The  graphical  results  of  observed  versus  predicted  are  shown  below. 


48 


Binomial  Distribution 

Binomial  Distntxjtion 

MDC2 

M0C2T 

80000      1 

80000 

70000 

«^h 

70000 

g^^ 

60000 

A     W 

60000 

. 

HV 

50000 

A     *_  ■ 

30000 

t 

40000 

il 

40000 

III 

30000 

Aa       V 

30000 

if* 

30000 

i^li      K 

20000 

V 

1  jitv 

10000 

0  J 

-<fl5i      It*.- 

10000 
0 

: ^\\\ 

ii  Ih^ 

1   4   7   10   13   16  19   22  Z3   28  31 


1  3  5  7  9  11  13  15  17  19  21  23  25  27  29  31  33 


Binomial  Distribution 
MDC4 


iti»HHi| 

1     J    J    7    9   II  13  15  17  19  21  23  2J  27  29  31  33 


Binomial  DistrlDution 

MDC4T 

80000      T 

70000 

Hw 

60000 

^ 

I 

90000 

HV 

J 

i\ 

30000 

11 

|b\ 

fli 

:|\ 

lOCOO 
0 

..^m 

illk 

1     3     3     7     9    II   13   13  17  19  21  23  25   27  29  31   33 


Binomial  Distntxjtion 
64  bit  MCIC4 

Binomial  Uistntmtion 
b4t)itrnjC4i 

«XXX)     1 
50000 

40000 

il. 

60000 
50000 

40000 

Ml 

■VXXX) 
20000 

i   ii 

30000 

I    i 

10000 
0 

J  L 

10000 
0 

1           J     L 

I      6     II    16    21    26    31    36  41    4«    51    56    61 


I     3     9    13    17  21  25  29  33  37  41  43  49  53   37  61  65 


X  axis:  #  of  one  bits;  Y  axis:  #of  checksums,  out  of  5 12,000,  which  contain  that  number  of  one  bits 
The  bars  show  the  actual  values;  the  lines  are  the  expected  values  assuming  even  distribution 


Chi-square  values  for  26  and  36  degrees  of  freedom: 

.01        .05       .25       .50       .75        .95  .99 

v=26  12.15    15.30   21.50  26.00    30.50    38.95  45.75 

v=36  19.18   23.21    29.91  36.00   41.36    51.04  58.72 


MDC2: 


chi-square  value 
31.196 


p  -  value 
.77 
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MDC2T: 

26.071 

MDC4: 

19,267. 

MDC4T: 

19,346. 

.50 

1.00 

1.00 


The  chi-square  statistics  from  the  binomial  test  for  the  MDC2  and  MDC2T  algorithms  do 
not  provide  evidence  to  reject  the  null  hypothesis.  The  chi-square  values  for  the  MDC4 
and  MDC4T  algorithms  provide  evidence  to  reject  the  null  hypothesis. 

Along  with  the  32  bit  MDC4  and  MDC4T  the  64  bit  MDC4  and  MDC4T  were  tested 
using  the  binomial  test.  The  following  results  were  obtained: 


chi-square  value 

p  -  value 

64  bit  MDC4: 

47.2 

.87 

64  bit  MDC4T 

39.0 

.61 

The  chi-square  statistics  for  the  binomial  tests  on  64  bit  MDC4  and  MDC4T  do  not 
provide  evidence  to  reject  the  null  hypothesis. 


3.1.4  Conclusions  from  statistical  testing 

The  results  of  die  chi-square,  collision,  and  binomial  tests  are  used  to  test  the  null  hy- 
pothesis, that  die  checksums  are  evenly  distributed. 


For  the  MDC4  and  MDC4T  checksums  algorithms  with  32  bit  checksums  there  is 


evi- 
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dence  to  reject  the  null  hypothesis  that  the  checksums  are  evenly  distributed.  Of  the  three 
tests  only  the  collision  test  provided  evidence  of  even  mapping.  The  results  of  the  other 
two  tests  (chi-square  and  binomial)  provide  evidence  to  reject  the  null  hypothesis.  Ex- 
tending the  MDC4  and  MDC4T  algorithms  to  64  bit  checksums  provides  statistical 
results  which  do  not  provide  evidence  to  reject  the  null  hypothesis.  Thus  we  conclude 
that  it  is  die  small  relative  primes  used  in  the  32  bit  four  equation  checksums  that  causes 
the  uneven  distribution  and  not  the  form  of  the  algorithm. 

The  null  hypothesis  cannot  be  rejected  with  the  MDC2  and  MDC2T  checksum  algo- 
rithms. The  three  tests;  chi-square,  collision  and  binomial  do  not  provide  evidence  to 
reject  the  null  hypothesis. 

3.2  Mutation  Testing 

To  simulate  an  actual  attack  on  a  given  checksum  a  mutation  test  was  used.  Given  a  file 
of  blocks:  F=  Fq,Fj,...,F_^,F_^^j,...  the  attacker  attempts  to  determine  a  V  consisting  of 
Vq,Vj,...V_^  such  that  when  it  is  inserted  into  the  file  with  the  resulting  file,  F'  = 
Fq,Fj,...,F_^,Vo,Vj,...,V_^,F_^^,,...  the  checksum  of  F'  is  equal  to  the  checksum  of  F.  This 
insertion  attack  can  be  further  specified  that  given  a  checksum  function  C()  the  attacker 
must  find  a  V  such  that 

C(F„,F,,...,FJ  =  C(F„,F.,...,F„,V„,V,....V  ) 
Note:  If  the  checksum  algoritimi  is  invertable  ( in  die  manner  that  if  given  a  resultant 
checksum  Y, ,  block  of  data  X,,  and  the  checksum  algorithm  C()  the  value  of  Y. ,  can  be 
found)  this  is  equivalent  to  the  birthday  attack. 

In  a  real  attack  the  first  part  of  V  would  be  die  virus  while  the  last  part  would  be  filler  to 
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make  the  equation  above  true.  A  mutation  attack  determines  how  many  different  fillers 
must  be  examined  before  finding  a  filler  tiiat  makes  the  equation  above  true. 

In  a  test  of  approximately  64,000,000  (2^)  mutations,  no  mutation  was  found  such  that 
C(F)  was  equal  to  C(F')  for  any  of  the  checksum  algorithms  MDC2,  MDC2T,  MDC4, 
MDC4T.  The  CPU  time  necessary  to  check  224  mutations  was  624  seconds  on  a  Harris 
HC-9  computer.  Extrapolating  this  result  to  the  time  necessary  (in  CPU  seconds)  to  gen- 
erate a  forgery  with  following  percentage  probability: 

12.5%  25%  50%  75%  87.5% 

21,300  46,000  110,700  221,500  332,200 

For  virus  protection  on  small  computers  these  times  should  provide  adequate  protection 
against  a  mutation  attack. 


3.3  Efficiency  Testing 

The  efficiency  test  used  in  this  work  is  the  time  a  given  checksum  algorithm  will  generate 
a  checksum  for  a  set  length  file.  In  order  for  a  checksum  algorithm  to  be  used  it  must 
execute  in  a  reasonable  amount  of  time.  The  time  taken  for  file  access  and  checksum 
algorithm  was  tested  on  three  different  types  of  computers  within  the  IBM  PC  family. 
The  computers  tested  were: 


1)  IBM  PC  with  8088  processor,  4.77  MHz  clock  speed,  20  megabyte  hard  drive  with 
access  time  80  milliseconds.  This  type  of  computer  represents  the  slowest  type  of  ma- 
chine on  which  checksum  protection  can  be  expected  to  be  used. 


an 


52 


2)  IBM  PC  clone  with  8088  processor,  10  MHz  clock  speed,  40  megabyte  hard  drive  with 
an  access  time  of  65  milliseconds.  This  computer  represents  the  current  entry  computer. 

3)  IBM  PC  clone  with  80286  processor,  10  MHz  clock  speed,  40  megbyte  hard  drive 
with  28  millisecond  access  time.  This  type  of  computer  represents  the  current  mid-to- 
high  range.  In  the  future  this  type  of  computer  will  represent  the  entry  computer. 

The  algorithms  tested  were  the  32  bit  versions  of  MDC2,  MDC2T,  MDC4,  MDC4T.  The 
implementations  of  these  algorithms  were  written  in  the  computer  language  C,  and 
compiled  with  the  Turbo  C  compiler. 

Results,  in  seconds,  for  32  bit  checksum: 


Computer: 

1 

2 

3 

MDC2 

32.9 

17.1 

4.6 

MDC2T 

40.4 

21.1 

4.7 

MDC4 

62.8 

32.3 

7.9 

MDC4T 

63.5 

32.5 

7.9 

Results,  in  seconds,  for  64  bit  checksum: 

MDC4  40.2  19.0  4.7 

MDC4T  40.2  21.1  4.9 

The  dramatic  differences  between  the  times  for  these  three  machines  are  mainly  due  to 
three  factors:  disk  access  speed,  clock  speed,  and  different  execution  times  for  opera- 
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tions. 


Disk  Access:  In  order  to  identify  the  differences  between  algorithms  a  copy  of  the  tested 
programs  without  a  null  code  checksum  algorithm  was  run  on  the  different  computers. 
The  results  were: 

time  published  access  speeds 
computer  1:      10  seconds  80ms 

computer  2:     7.4  seconds  65ms 

computers:     2.4  seconds  28ms 

This  difference  can  be  explained  by  the  speed  of  disk  access  and  represents  a  lower 
bound  to  the  speed  of  which  any  algorithm  that  examines  the  entire  executable  file  can 
execute. 

Clock  Speed:  The  speed  of  the  CPU  is  an  important  factor  in  the  elapsed  time  to  execute 
a  checksum  algorithm  on  a  program.  The  faster  cycle  time  is  the  primary  difference 
between  the  two  8088  based  machines. 

Execution  Time  for  Operations:  Certain  instructions  take  significantly  less  time  to 
execute  on  a  80286  (and  80386)  processor  than  on  the  8088  processor.  The  two  instruc- 
tions where  die  speed  was  increased  significantly  were  division  and  multiplication.  The 
number  of  clock  cycles  it  takes  for  the  8088  processor  to  execute  a  divide  is  between  144 
and  166  (  for  16  bit  divides)  compared  with  22  for  a  80286  or  80386  processor.  Since 
divide  operations  are  a  crucial  operation  for  the  modulo  operation  and  are  done  6  times 
for  each  checksum  in  the  MDC2  and  MDC2T  algorithms  and  12  times  in  the  MDC4  and 
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MDC4T  algorithms,  division  greatly  increases  the  time  necessary  for  execution.  A 
similar  difference  can  be  observed  for  multiplication  which  occurs  two  or  four  times  in 
calculating  a  checksum  depending  on  the  algorithm. 

3.4  Conclusions 

The  checksum  algorithms  were  examined  from  the  perspectives  of  providing  an  even 
mapping  of  checksum,  insertion  of  filler  data  such  that  the  same  checksum  is  obtained, 
and  the  speed  of  execution.  The  two  algorithms,  MDC4  and  MDC4T  with  32  bit  check- 
sums do  not  provide  even  mapping,  and  thus  can  be  eliminated.  In  their  place  we  also 
considered  the  MDC4  and  MDC4T  algorithms  with  64  bit  checksums.  These  two  algo- 
rithms with  64  bit  checksums  did  provide  even  mapping  and  are  acceptable  checksum 
algorithms. 

All  the  algorithms,  as  expected  from  an  even  mapping  perspective,  provided  protection 
against  a  mutation  attack. 

The  efficiency  of  the  checksum  algorithms  varied  greatly  with  respect  to  computer.  The 
rank  ordering  of  efficiency  was:  1)  MDC2,  2)  MDC2T,  3)  MDC4  (64  bit)  and  4)  MDC4T 
(64  bit).  On  slower  machines  (and  especially  those  with  8088  processors)  the  differences 
are  significant.  On  80286  (and  funher  generations  of  the  80x86  family)  processor  based 
machines  the  differences  in  execution  speed  are  not  significant. 
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Chapter  4       Implementation 

A  virus  detection  program  should  reliably  inform  the  user  that  a  virus  has  entered  the 
system.  This  function  can  be  broken  into  two  parts:  detecting  the  virus  and  informing  the 
user.  Both  of  these  functions  rely  on  the  operating  system  to  insure  the  virus  detection 
code  is  executed  before  each  program  is  started  and  that  the  virus  detection  code  and 
checksum  value  has  not  been  changed.  Unfortunately,  with  most  microcomputers  the 
operating  system  provides  only  minimal  protection  at  best 

4.1  Goals 

The  goals  for  implementation  of  a  virus  detection  mechanism  using  checksums  are:  1) 
that  die  virus  detection  mechanism  is  executed  before  each  program  is  executed  and  that 
the  virus  detection  mechanism  is  not  changed  by  a  virus,  2)  that  the  checksum  generated 
by  running  the  checksum  algorithm  is  stored  in  a  place  not  easily  modifiable  to  the  user, 
and  3)  that  the  virus  detection  feature  can  be  implemented  without  major  changes  in 
system  operation. 

4.1.1  Protected  Operating  System/Checksum  Routine 

The  change  of  an  operating  system  to  execute  a  virus  detection  program  before  the 
execution  of  user  programs  is  relatively  simple.  Insuring  that  the  code  tiiat  calls  the  virus 
detection  program  and  tiiat  the  virus  detection  program  has  not  been  changed  is  difficult 
on  small  and  personal  computer  operating  systems.  If  a  virus  knows  the  location  and 
operation  of  die  virus  detection  code  and  has  the  ability  to  change  that  code,  it  is  possible 
for  a  virus  to  disable  the  virus  detection  routine.  An  example  of  change  which  would 
nullify  die  work  of  a  detection  program  would  be  to  modify  the  return  value  of  the  com- 
parison of  old  and  new  checksums  such  that  the  return  value  always  indicated  no  virus. 
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4.1.2  Protected  Checksum  Storage 

The  checksums  for  programs  should  be  stored  in  a  location  not  readily  accessible  to  the 
user  or  to  the  virus  operating  on  behalf  of  a  user.  If  a  virus  has  "write"  access  to  the  loca- 
tion used  to  store  checksums  and  knows  the  checksum  algorithm,  the  virus  can  calculate 
a  checksum  on  the  virus  infected  program  and  insert  the  new  checksum  so  that  the  virus 
detection  algorithm  does  not  detect  the  virus. 

4.2  Protection  Features 

Ideally  users  would  be  prevented  from  modifying  either  the  virus  detection  code  or  the 
checksums.  This  implies  that  the  user's  programs  are  limited  to  accessing  only  the 
memory  allocated  to  them.     Methods  of  limiting  programs  to  set  memory  ranges  include 
bounds  checking,  and  virtual  memory. 

4.2.1  Bounds  Checking 

A  program  that  employes  bounds  checking  compares  each  requested  address  with  the 
bounds  register(s).  If  the  address  is  not  within  the  acceptable  accessible  memory  the 
operation  is  not  allowed  to  be  executed  and  the  job  terminated  with  the  proper  error 
message.  [DIE84]    Bounds  checking  can  be  done  on  all  levels  of  memory  and  most 
particularly  main  memory  and  secondary  (disk)  storage.  Most  microcomputers  do  not 
employ  bounds  checking,  tiiereby  severely  limiting  its  use  in  protection  against  modifica- 
tion of  the  virus  detection  routine  by  viruses. 
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4.2.2  Virtual  Memory 

The  typical  microcomputer  is  designed  to  be  a  single  user  system  with  that  user  having 
total  control  over  all  available  memory  locations.  These  systems  can  use  a  virtual  mem- 
ory operating  system. 

When  using  an  operating  system  employing  virtual  memory,  each  user  process  has  a 
private  address  space  that  contains  its  programs  and  data.  Each  word  in  the  process's 
address  space  has  a  fixed  virtual  address  that  the  programs  in  the  process  use  to  access 
that  word.  In  executing  a  memory  reference  instruction,  the  hardware  computes  the 
virtual  address  that  identifies  the  target  location  of  the  reference  by  using  a  value  or  offset 
contained  in  a  field  of  the  instruction  plus  some  index  registers  and  address  registers. 
The  virtual  address  is  then  translated,  or  mapped,  by  hardware  into  a  physical  address. 
This  translation  is  transparent  to  the  program.  [GEL88]  The  translation  from  virtual 
addresses  to  physical  addresses  can  be  as  simple  as  adding  a  nonmodifiable  base  value  to 
all  virtual  addresses  giving  physical  addresses  to  demand  paging  or  segmentation 
schemes. 

Virtual  memory  systems  are  slower  than  nonvirtual  memory  systems  because  there  is  at 
least  one  address  conversion  for  each  memory  reference.  Even  if  these  functions  are 
implemented  in  hardware  or  microcode  there  is  a  performance  penalty.  For  tiiis  and  other 
reasons  most  microcomputers  do  not  have  virtual  memory  implemented  by  tiieir  operat- 
ing systems. 
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4.2.3  Ignorance 

One  of  the  strongest  protection  features  is  the  lack  on  knowledge  on  the  part  of  the  virus 
of  the  exact  location  of  the  virus  detection  code.  If  the  virus  does  not  know  where  the 
virus  detection  code  is  stored,  then  it  must  either  search  for  that  code  or  modify  code  at 
random. 

For  the  virus  to  search  for  the  virus  detection  portion  of  the  kernel,  it  must  have  a  pattem 
to  search  against.  This  implies  that  the  virus  designer  knew  a  good  deal  about  the  virus 
detection  code  and  that  the  virus  will  carry  around  enough  tell  tale  parts  of  the  virus 
detection  code  to  be  able  to  identify  the  virus.  Since  the  virus  detection  code  is  not  trivial 
this  often  increases  the  size  of  the  virus  significantly.    If  this  threat  is  considered  serious 
enough  then  multiple  copies  of  the  virus  detection  code  can  be  used.  A  further  step  is  to 
have  different  implementations  of  the  virus  detection  code  in  several  different  locations 
so  that  a  virus  would  need  to  carry  information  on  each  vuns  detection  code  in  order  to 
disable  all  of  the  virus  detectors. 

4.3  MINIX  Example 

MINIX  is  an  operating  system  that  is  a  subset  of  UNIX  Version  7  (V7).  MINIX  was 
developed  by  Tanenbaum  and  is  described  in  his  Operating  Systems  textbook.  [TAN87] 
MINIX  contains  nearly  all  the  V7  system  calls,  and  these  calls  are  identical  to  the  corre- 
sponding V7  calls.  MINIX  was  originally  written  for  the  IBM  PC,  XT,  and  AT  and  has 
since  been  poned  to  the  NS  16032  and  the  68000.  The  version  of  MINIX  used  in  this 
research  work  is  version  1.1  for  the  IBM  XT.  For  further  details  refer  to  the  textbook 
which  includes  most  of  the  operating  system  source  code. 
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4.3.1  General  Description 

MINIX  is  a  layered  operating  system  where  communications  between  layers  is  accom- 
plished by  message  passing  thus  insulating  the  kernel  from  the  users. 

When  a  process  is  created  its  cs  register  is  set  at  the  base  address  of  the  process.  The 
process  is  allocated  the  amount  of  space  specified  in  a  header  file  at  the  top  of  the  pro- 
gram to  be  run.  This  amount  of  space  is  typically  64K. 

There  is  no  checking  for  attempts  to  read  or  write  outside  the  memory  requested.  All 
addresses  are  physical  addresses  (  no  virtual  memory)  and  instructions  can  read  or  write 
areas  in  the  operating  system  space  by  changing  the  register  values. 

4.3.3.1  MINIX  Protection  Mechanisms 

MINIX  protection  mirrors  UNK  protection  and  is  a  variation  of  an  Access  Control  List 
based  system.  Each  user  has  a  domain  that  it  can  operate  in  (files  it  has  certain  access 
rights  to)  defined  by  its  userid  (uid)  and  group  id  (gid).  If  an  object  is  not  in  the  domain 
of  a  process  then  the  process  is  refused  access  to  that  object.  The  rights  that  processes 
may  possess  are  read,  write,  and  execute. 


4.3.2  Implementation 

For  the  purposes  of  this  thesis,  it  was  deemed  that  the  Operating  System  provides  ade- 
quate protection  for  both  the  checksum  routine  and  the  storage  of  checksums.  The 
checksum  will  be  stored  in  a  file  readable  by  all,  appendable  by  users,  and  changable  by 
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root..  When  a  process  requests  to  execute  a  new  procedure  a  execve  call  to  the  operating 
system  is  made.  The  calling  process  passes  the  name  of  the  procedure  along  with  the 
proper  checksum  is  passed.  The  do_exec  procedure  in  the  memory  manager  calculates  a 
checksum  and  compares  it  with  the  checksum  passed.  Different  values  for  the  calculated 
and  the  passed  checksums  terminate  execution  of  the  program. 

4.3.4  Weaknesses 

The  weakness  of  this  approach  is  not  being  able  to  limit  user  processes  to  proper  memory 
locations.  Since  there  is  no  bounds  checking  or  virtual  memory,  a  virus  potentially  has 
ability  to  change  any  memory  location. 

4.4  Suggestions  for  other  Operating  Systems 

The  weaknesses  of  MINIX  are  present  in  all  operating  systems  that  do  not  isolate  a  user 
process  in  its  own  memory  area.  Most  operating  systems  do  not  provide  even  the  protec- 
tion mechanisms  of  MINDC.  Thus  any  file  can  be  modified  or  executed  by  the  user  or  a 
virus  acting  on  his/her  behalf,  instead  of  just  those  the  process  has  "write"  access. 

Ideally,  users  will  have  the  operating  system  source  code  to  directly  incorporate  the  virus 
detection  mechanism,  and  then  recompile  the  operating  system.  This  is  not  the  case  with 
most  operating  systems.  Instead  the  virus  detection  mechanism  must  be  added  on  top  of 
the  operating  system.  Placing  the  virus  detection  mechanism  outside  the  operating 
system  makes  its  location  better  known  and  easier  to  disable. 
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4.5  Conclusions 

Modest  protection  by  checksums  can  be  provided  even  with  an  insecure  operating  sys- 
tem. However,  with  these  operating  systems  a  virus  can  either  attack  the  checksum 
mechanism  or  determined  programs  with  the  same  checksum.  When  a  virus  is  limited  to 
the  section  of  memory  it  is  allocated  (with  either  bounds  checking  or  virtual  memory) 
then  only  brute  force  or  trap  door  attacks  are  feasible. 
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Chapter  5       Conclusions 

This  chapter  provides  a  brief  review  and  conclusions  from  the  previous  four  chapters, 
draws  conclusions  with  respect  to  particular  classes  of  computers,  and  discusses  future 
research  possibilities. 

5.1  Review 

The  virus  problem  is  considered  a  subset  of  a  larger  integrity  issue.  Virus  detection/pre- 
vention can  be  directly  classified  under  the  control  of  change  of  static  data  in  the  Clark  & 
Wilson  integrity  model.  The  only  method  that  currently  shows  promise  for  detecting  any 
virus  other  than  the  simplest  virus  is  a  checksum  technique.  These  checksums  can  be 
generated  by  either  cryptographic  or  noncryptographic  algorithms. 

Checksum  algorithms  must  have  the  properties  of  even  mapping,  permutation  sensitivity, 
and  overdeterminism.  To  provide  protection  against  an  active  attacker  versus  detecting 
random  errors,  a  checksum  algorithm  must  produce  checksums  that  are  of  adequate 
length  and  the  algorithm  must  be  noninvertable.  The  active  attacker  can  employ  several 
different  types  of  attacks  including  the  brute  force  attack  and  the  trap  door  attack.  An- 
other attack,  the  birthday  attack  was  deemed  not  applicable  to  the  virus  problem  when  a 
strong  checksum  algorithm  is  employed   The  trap  door  attack  was  deemed  to  be  the  most 
serious  threat. 

Checksum  algorithms  employ  the  techniques  of  substitution,  transposition  and  feedback 
to  produce  checksums  that  provide  the  necessary  strength  to  deter  attackers.  Both  crypto- 
graphic and  noncryptographic  checksum  algorithms  employ  these  mechanisms.  The 
cryptographic  algorithms  typically  employ  large  amounts  of  substitution  and  transposi- 
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tion  making  the  algorithms  very  computationally  complex.  The  computational  complex- 
ity of  cryptographic  algorithms  limits  their  use  to  fast  computers.  Noncryptographic 
algorithms  can  provide  adequate  protection  against  attackers  with  fast  enough  execution 
to  use  on  small  computers. 

Four  specific  noncryptographic  algorithms  were  investigated.  The  tests  employed  in- 
cluded statistical,  efficiency  and  a  simulated  attack.  Two  algorithms,  MDC2  and 
MDC2T  were  shown  to  provide  adequate  protection  with  32  bit  or  greater  checksum 
length,  while  two  other  algorithms,  MDC4  and  MDC4T,  provided  adequate  protection  at 
the  64  bit  checksum  length. 

5.2  Particular  Conclusions 

This  section  discusses  the  effects  of  the  general  conclusions  as  they  apply  to  specific 
classes  of  computers. 

The  basic  trade  off  with  noncryptographic  algorithms  is  efficiency  versus  trap  door 
protection.  The  trap  door  protection  is  provided  by  additional  substitution  and  feedback 
as  described  in  section  2.3.  The  differences  in  feedback  between  the  four  algorithms  dis- 
cussed were  either  by  adding  an  extra  history  term  (tss)  or  by  increasing  the  number  of 
equations  for  which  a  single  data  block  is  directly  used  (four  equations  versus  two  equa- 
tions). 

The  additional  feedback  provided  by  the  extra  tss  term  in  the  MDC2T  and  MDC4T 
algorithms  should  be  more  resistant  to  trapdoor  attacks  than  the  corresponding  algorithms 
without  the  tss  term  (MDC2  and  MDC4).     For  example,  the  MDC2T  algorithm  should 
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be  more  resistant  to  trap  door  attacks  than  the  MDC2  algorithm. 

The  four  equation  algorithms,  MDC4  and  MDC4T,  should  provide  more  protection  from 
trapdoor  attacks  than  two  equation  checksum  algorithms,  MDC2  and  MDC2T.  When 
using  16  bit  data  blocks  with  the  MDC4  and  MDC4T  algorithms  (64  bit  checksums)  there 
should  not  be  a  decrease  in  the  effort  to  determine  trap  door  attacks. 

For  computers  that  have  fast  disk  access  and  low  CPU  cycle  time  the  use  of  the  MDC4T 
(64  bit)  algorithm  is  suggested.  The  additional  protection  against  trap  door  attacks  is 
provided  widi  only  a  small  time  penalty.  For  computers  that  have  medium  disk  access 
time  and  medium  CPU  cycle  time  the  recommended  choices  are  the  MDC4T  (64  bits)  for 
best  protection  or  the  MDC2  for  faster  execution  with  less  protection.  For  slow  comput- 
ers the  MDC2  algorithm  is  recommended. 

These  results  are  summarized  in  the  table  below. 


Disk  Access  Time 

Fast 

Slow 

Computer 
Speed 

Fast 

MDC4T 

MDC4T 

Slow 

MDC2 

MDC2 

As  a  review,  the  basic  forms  of  the  algorithms: 


MDC2:  Two  equations  of  the  form 

Ml  =  (Ml'^Tl  +  M2^T2)**2  Mod  N 
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MDC2T:  Two  equations  with  additional  feed  back  term  of  the  form 

Ml  =  (Ml'^Tl  +  M2^T2-TSS)**2  Mod  N 

MDC4:  Four  equations  of  the  form 

Ml  =  (Ml'^Tl  -  M2^T2  +  M3^T3  -  M4^T4  )**2  Mod  N 

MDC2T:  Four  equations  with  additional  feed  back  term  of  the  form 

Ml  =  (Ml'^Tl  -  M2^T2  +  M3^T3  -  M4^T4-TSS  )**2  Mod  N 


5.3  Further  Research 

The  virus  detection/protection  field  offers  areas  of  future  research.  It  is  desirable  to  be 
able  to  prevent  viruses  from  entering  a  computer  system  by  examining  the  entering 
information.  Though  virus  detection  is  undecidable  in  the  general  case,  it  may  be  pos- 
sible to  partition  programs  into  one  of  three  categories:  1)  program  does  not  contain  a 
virus,  2)  program  contains  a  virus,  and  3)  cannot  tell  if  the  program  does  or  does  not 
contain  a  virus.  If  Uie  tiiird  category  can  be  reduced  to  a  modest  level  this  would  repre- 
sent significant  progress  in  virus  protection.  Note  that  this  would  probably  need  be  done 
at  the  object  code  level. 

The  integrity  field  is  a  fertile  area  for  future  research.  There  is  a  need  for  work  at  all 
levels  including: 

1)  General  Models.  The  models  for  integrity  are  generally  considered  inadequate  at  the 
same  time  the  need  for  integrity  is  increasing.  Since  lower  level  models  depend  on 
tiieoretically  sound  higher  level  models  advances  in  tiiis  area  are  imponant. 

2)  Intermediate  Concerns.  The  identification  of  integrity  mechanisms  that  are  common 
across  most  or  at  least  many  applications  are  needed.  These  mechanisms  provide  the 
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building  blocks  to  enable  applications  to  maintain  integrity. 

3)  Implementation  Concerns.  The  actual  implementation  and  study  of  the  use  of  general 
integrity  mechanisms  is  needed. 
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Appendix  -  Chi-Square  MDC2 
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Appendix  -  Chi-Square  MDC2T 
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Appendix  -  Chi-Square  MDC4  (32  bit  checksum  length) 
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Appendix  -  Chi-Square  MDC4T  (32  bit  checksum  length) 
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78  5070313 
09  253125 


827.050781 


905.557813 


58  0126953 


994  810938 


1052  82363 


1150  72676 


152  973633  1303  70039 


108  40332  1412.10371 


5120J   4820  5125J  629900078 
419.0001 


5120 


66  3845703     1478-46828 
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Appendix  -  Chi-Square  MDC4  (64  bit  checksum  lengtii) 


ChiSouwt  MOC*  -  64  brt 


5120    5.06269531 


5120    0.00957031 


5.07304688 


-SI25 


0  09453125 


5120    Q. 09453125 


0.7267578 
t   25 


14  8216797 


15  5464375 


1   411 13281 


24  7601563 


25  4400391 
25  784570: 


26  8271484 
26  827 


5120     04314453 


5120     0.01953125 


5120     0.0439453 


5120     1   09863281 


5120    5.12578125 


33  5464844 


5120     1   72578125 


5120     1.3455078 


5120     0.41326125 


5120     4.10644531 


5120     1.58203125 


36  6177734 


37  0310547 


5120     0.12207031 


99 

IQO 


5120J    0.72675781 


45  0308594 


S120  0  6251953 


51  20 


4606605 4 7 


5249 


0.225761 25 


47  6599609 


6093 


-iUO 


50.9101563 


_&li 


5120 


51.1025391 


5116 

sua 


5170 


1  0125  57.9798628 


0^1.125  58  0923828 


000488281 


0  1220703' 


61  7582031 


61  7677734 


62  3373047 


62  5111323 


0.01953125  63  9027344 


5120  0.5060078 


5,1  20  0.48826125 


5120 


-ilio. 


S'10 


5051 


0  09453125 


0  26203125 


000078125 


2  36326125 


0.22578125 


1.040B203 


0.876757811 


65  4335938 


66.6734375 


75  5445313 


5120 
5 


0.01953125  80.2474609 


51  2C 


0  03300781 


0.16425761 


SlgOTo. 929882811  89.0898438 


81  965039' 


81  9980469 


62.0310547 


82  1953125 


66  4734375 
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Appendix  -  Chi-Square  MDC4T  (64  bit  checksum  lengtli) 


Chi  Squir»  M0C4T      64  M 


0.00703125    0.00703125 


S  409S7031 


5.08320313 


8  39570313 


1  1   49648*4 


5120     0.0861328 


5120J    1.31328125 


5120     1   91425781 


M   8357422 

13.1490234 


-1* 


-ZS. 


51 20     1   87578125 


26.219'T266 


51 20  0  22578125 


0  03828125 


2.03203125  30  6285156 


0  07050781 


10  4220703 


_52i. 


28.5964844 


31.9804688 


_5i 


_ii 


-IS 


too 


5120  09570312S 


512QJ  0  07050781 


S12Qf  0  32832031 


5120^  0.2392578 


5120  1  7257812S 


5257 


5120  0  09453125 


5120  0  39S50781 


5120  1  44453125 


61  5884766 


Si  201  0.0236328 


5120  0  98457031 


5120  1  5470703 


5120  1  18828125  71.844140^ 


0  2  72.0441406 


5120  3.66S82031 


5120  Q.7S07ai?S 


5120  2.40644531 


5263 


5120 
5 


0.08613281 


0  00703125 


1  04082031 


0.29707031, 


69.1087891 


76.159960^ 


76.2724605 


77.0232422 


79  4296875 


79  561718a 


79  73  75 


79  8236321 


81  336t32< 


01  3431641 


81  8917969 


82  013867; 


83  2804688 


84.8058594 


87  9830078 


0  2  90.0365234 


S12Q  399394531 


$120  1.87578125 


SI20 


SI  20  0  2673828 


0.Q632812S 


5120  Q  8767578 


0  00175781 


51 20    0  028125 


1.120  2.5830078 


5120  1411 1328 


5120 


5120 


5120 


5120  t. 09863281 


1.06953125 


Q.T  0332031 


51 20  1  2189453 


51 20  1  72578125 


51 20  0  43144531 


98  1828125 


100  2201 17 


104214 2 58 


104  326758 


5120  0  3955078 


5120  0.00078125 


5120  2  71953125 


1  14  647461 


5120!   4  937695311    119.585156 
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Appendix  -  Binomial  Test  MDC2 


Binomial  Distribution  of  MDC2 

observed 

calculated 

(o-c)^2/c 

chi-square 

0 

0 

0.0001 

0.0001 

0.0001 

1 

0 

0.0038 

0.0038 

0.0039 

2 

0 

0.059 

0.059 

0.0629 

3 

1 

0.59 

0.28491525 

0.34781525 

4 

10 

4.29 

7.60002331 

7.94783856 

5 

21 

24 

0.375 

8.32283856 

6 

115 

108.03 

0.44969823 

8.7725368 

7 

395 

401.24 

0.09704317 

8.86957996 

8 

1295 

1253.88 

1.34849778 

10.2180777 

9 

3377 

3343.68 

0.33203608 

10.5501138 

10 

7697 

7690.5 

0.00549379 

10.5556076 

1  1 

15161 

15380.92 

3.14446772 

13.7000753 

12 

26765 

26916.61 

0.85395568 

14.554031 

13 

41343 

41410.16 

0.10892171 

14.6629527 

14 

56603 

56199.51 

2.89689679 

17.5598495 

15 

67669 

67439.4 

0.78168192 

18.3415314 

16 

71725 

71654.37 

0.06962027 

18.4111517 

17 

67558 

67439.4 

0.20857184 

18.6197236 

18 

56363 

56199.51 

0.47560877 

19.0953323 

19 

41362 

41410.16 

0.05601006 

19.1513424 

20 

26713 

26916.61 

1.54020258 

20.691545 

21 

15243 

15380.92 

1.23672228 

21.9282673 

22 

7563 

7690.5 

2.11380925 

24.0420765 

23 

3311 

3343.68 

0.31940329 

24.3614798 

24 

1221 

1253.88 

0.86219925 

25.223679 

25 

369 

401.24 

2.59051341 

27.8141925 

26 

96 

108.03 

1.33963621 

29.1538287 

27 

22 

24 

0.16666667 

29.3204953 

28 

2 

4.29 

1.22240093 

30.5428963 

29 

0 

0.59 

0.59 

31.1328963 

30 

0 

0.06 

0.06 

31.1928963 

31 

0 

0.003 

0.003 

31 .1958963 

32 

0 

0.0001 

0.0001 

31 .1959963 

observed 

calculated 

(o-c)'^2/c 

chi-square 
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Appendix  -  Binomial  Test  MDC2T 


Binomial  Distribution  of  MDC2T 

observed 

calculated 

(o-c)'^2/c 

chi-square 

0 

0 

0.0001 

0.0001 

0.0001 

1 

0 

0.0038 

0.0038 

0.0039 

2 

0 

0.059 

0.059 

0.0629 

3 

1 

0.59 

0.28491525 

0.34781525 

4 

9 

4.29 

5.17111888 

5.51893414 

5 

27 

24 

0.375 

5.89393414 

6 

116 

108.03 

0.58799315 

6.48192729 

7 

432 

401.24 

2.35813379 

8.84006107 

8 

1223 

1253.88 

0.76049893 

9.60056 

9 

3380 

3343.68 

0.39451814 

9.99507814 

10 

7779 

7690.5 

1.01843183 

11.01351 

1  1 

15227 

15380.92 

1.5403088 

12.5538188 

12 

26851 

26916.61 

0.15992624 

12.713745 

13 

41345 

41410.16 

0.10253101 

12.816276 

14 

56424 

56199.51 

0.89672953 

13.7130055 

15 

67649 

67439.4 

0.65143166 

14.3644372 

16 

71655 

71654.37 

5.5391E-06 

14.3644427 

17 

67669 

67439.4 

0.78168192 

15.1461247 

18 

56075 

56199.51 

0.27585187 

15.4219765 

19 

41549 

41410.16 

0.4655028 

15.8874793 

20 

26671 

26916.61 

2.24115415 

18.1286335 

21 

15355 

15380.92 

0.04368051 

18.172314 

22 

7546 

7690.5 

2.71507054 

20.8873845 

23 

3278 

3343.68 

1.29015408 

22.1775386 

24 

1217 

1253.88 

1.08474049 

23.2622791 

25 

399 

401.24 

0.01250523 

23.2747843 

26 

99 

108.03 

0.75479867 

24.029583 

27 

22 

24 

0.16666667 

24.1962497 

28 

2 

4.29 

1.22240093 

25.4186506 

29 

0 

0.59 

0.59 

26.0086506 

30 

0 

0.06 

0.06 

26.0686506 

31 

0 

0.003 

0.003 

26.0716506 

32 

0 

0.0001 

0.0001 

26.0717506 

observed 

calculated 

(o-c)'^2/c 

chi-square 
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Appendix  ■  Binomial  Test  MDC4  (32  bit  checksum  length) 


Binomial  Distribution  of  MDC4 

observed 

calculated 

(o-c)^2/c 

chi-square 

0 

0 

0.0001 

0.0001 

0.0001 

1 

0 

0.0038 

0.0038 

0.0039 

2 

0 

0.059 

0.059 

0.0629 

3 

1 

0.59 

0.28491525 

0.34781525 

4 

1  0 

4.29 

7.60002331 

7.94783856 

5 

66 

24 

73.5 

81.4478386 

6 

200 

108.03 

78.2975183 

159.745357 

7 

729 

401.24 

267.736561 

427.481918 

8 

1993 

1253.88 

435.686329 

863.168247 

9 

4912 

3343.68 

735.604969 

1598.77322 

10 

10507 

7690.5 

1031.48979 

2630.26301 

1  1 

20073 

15380.92 

1431.35877 

4061.62178 

12 

32861 

26916.61 

1312.78688 

5374.40866 

13 

48784 

41410.16 

1313.04772 

6687.45638 

14 

63562 

56199.51 

964.532591 

7651.98897 

15 

72501 

67439.4 

379.893572 

8031.88255 

16 

73673 

71654.37 

56.8683679 

8088.75091 

17 

64639 

67439.4 

116.285734 

8205.03665 

18 

49865 

56199.51 

713.992292 

8919.02894 

19 

33459 

41410.16 

1526.70131 

10445.7302 

20 

19362 

26916.61 

2120.33136 

12566.0616 

21 

9284 

15380.92 

2416.78869 

14982.8503 

22 

3792 

7690.5 

1976.24371 

16959.094 

23 

1343 

3343.68 

1197.10034 

18156.1943 

24 

305 

1253.88 

718.069715 

18874.2641 

25 

61 

401.24 

288.513751 

19162.7778 

26 

1  8 

108.03 

75.0291669 

19237.807 

27 

0 

24 

24 

19261  .807 

28 

0 

4.29 

4.29 

19266.097 

29 

0 

0.59 

0.59 

19266.687 

30 

0 

0.06 

0.06 

19266.747 

31 

0 

0.003 

0.003 

19266.75 

32 

0 

0.0001 

0.0001 

19266.7501 

observed 

calculated 

(0-C)'^2/C 

chi-square 
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Appendix  -  Binomial  Test  MDC4T  (32  bit  checksum  length) 


Binomial  Distribution  of  MDC4T 

observed 

calculated 

(o-c)'^2/c 

chi-square 

0 

0 

0.0001 

0.0001 

0.0001 

1 

0 

0.0038 

0.0038 

0.0039 

2 

2 

0.059 

63.8556102 

63.8595102 

3 

3 

0.59 

9.84423729 

73.7037475 

4 

8 

4.29 

3.20841492 

76.9121624 

5 

59 

24 

51.0416667 

127.953829 

6 

190 

108.03 

62.1964352 

190.150264 

7 

670 

401.24 

180.021776 

370.172041 

8 

1901 

1253.88 

333.974778 

704.146819 

9 

4784 

3343.68 

620.430694 

1324.57751 

10 

10347 

7690.5 

917.624634 

2242.20215 

1  1 

20058 

15380.92 

1422.22164 

3664.42379 

12 

33126 

26916.61 

1432.44354 

5096.86733 

13 

49214 

41410.16 

1470.65162 

6567.51895 

14 

63374 

56199.51 

915.903124 

7483.42207 

15 

72883 

67439.4 

439.398645 

7922.82072 

16 

73417 

71654.37 

43.3590375 

7966.17976 

17 

64515 

67439.4 

126.811854 

8092.99161 

18 

50503 

56199.51 

577.411194 

8670.4028 

19 

33198 

41410.16 

1628.5755 

10298.9783 

20 

18870 

26916.61 

2405.50101 

12704.4793 

21 

9358 

15380.92 

2358.47825 

15062.9576 

22 

3851 

7690.5 

1916.8793 

16979.8369 

23 

1238 

3343.68 

1326.05042 

18305.8873 

24 

348 

1253.88 

654.463405 

18960.3507 

25 

72 

401.24 

270.159948 

19230.5106 

26 

10 

108.03 

88.9556688 

19319.4663 

27 

1 

24 

22.0416667 

19341.508 

28 

0 

4.29 

4.29 

19345.798 

29 

0 

0.59 

0.59 

19346.388 

30 

0 

0.06 

0.06 

19346.448 

31 

0 

0.003 

0.003 

19346.451 

32 

0 

0.0001 

0.0001 

19346.4511 

observed 

calculated 

(0-c)'^2/c 

chi-square 
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Appendix  -  Binomial  Test  MDC4  (64  bit  checksum  length) 


Binomial  Distribution  of  MDC4  -  64  bit 

obsewed 

calculated 

(0-C)«2/C 

chi-SQuare 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

2 

0 

0 

C 

0 

3 

0 

0 

0 

0 

4 

0 

0 

0 

0 

5 

0 

0 

0 

0 

6 

0 

0 

0 

0 

7 

0 

0 

0 

0 

e 

0 

0 

0 

0 

9 

0 

0 

0 

0 

10 

0 

0.004 

0.004 

0.004 

1 1 

0 

0.02 

0.02 

0.024 

12 

0 

0.09 

0.09 

0.1  14 

13 

0 

0.36 

0.36 

0.474 

14 

2 

1.33 

0.3375188 

0.8115188 

15 

6 

4.43 

0.55641084 

1.36792963 

16 

7 

13.56 

3.17356932 

4.54149895 

17 

46 

38.3 

1.54804178 

6.08954073 

18 

106 

99.97 

0.36371812 

6.45325884 

19 

238 

242 

0.0661157 

6.51937455 

20 

552 

544.56 

0.1016483 

6.62102285 

21 

1129 

1140.98 

0.12578696 

6.74680981 

22 

2241 

2230.09 

0.05337368 

6.80018348 

23 

3963 

4072.34 

2.93571647 

9.73589995 

24 

6811 

6956.91 

3.06022762 

12.7961276 

25 

11091 

11131.05 

0.1  4410163 

12.9402292 

26 

16795 

16696.58 

0.58014853 

13.5203777 

27 

23569 

23498.89 

0.20917635 

13.7295541 

28 

31342 

31052.11 

2.70629635 

16.4358504 

29 

38615 

38547.45 

0.11837365 

16.5542241 

30 

3  1 

45371 

AQACA 

44972.01 

3.53982444 

20.0940485 

32 

50712 

50865.53 

0.46340736 

20.8994025 

33 

49416 

49324.13 

0.17111497 

21.0705174 

34 

45196 

44972.01 

1.11561658 

22.186134 

35 

38765 

38547.45 

1.22778556 

23.4139196 

36 

30825 

31052.11 

1.661045 

25.0749646 

37 

23290 

23498.89 

1.85689759 

26.9318622 

38 

16437 

16696.58 

4.03566338 

30.9675255 

39 

11075 

11131.05 

0.28223775 

31.2497633 

40 

6765 

6956.91 

5.2939377 

36.543701 

41 

3980 

4072.34 

2.09380248 

38.6375035 

42 

2140 

2230.09 

3.63940832 

42.27691  18 

43 

1111 

1140.98 

0.78774422 

43.064656 

44 

562 

544.56 

0.55853092 

43.6231869 

45 

237 

242 

0.10330579 

43.7264927 

46 

94 

99.97 

0.35651595 

44.0830087 

47 

39 

38.3 

0.01279373 

44.0958024 

48 

14 

13.56 

0.01427729 

44.1100797 

49 

3 

4.43 

0.46160271 

44.5716824 

SO 

0 

1.33 

1.33 

45.9016824 

51 

1 

0.36 

1.13777778 

47.0394602 

52 

0 

0.09 

0.09 

47.1294602 

53 

0 

0.02 

0.02 

47.1494602 

54 

0 

0.004 

0.004 

47.1534602 

55 

0 

0 

0 

47.1534602 

56 

0 

0 

0 

47.1534602 

57 

0 

0 

0 

47.1534602 

58 

0 

0 

0 

47.1534602 

59 

0 

0 

0 

47.1534602 

60 

0 

0 

0 

47.1534602 

61 

0 

0 

0 

47.1534602 

62 

0 

0 

0 

47.1534602 

63 

0 

0 

0 

47.1534602 

64 

0 

0 

0 

47.1534602 
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Appendix  -  Binomial  Test  MDC4T  (64  bit  checksum  lengtli) 


10 


1 1 


12 


13 


15 


1  6 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


29 


30 


31 


32 


33 


34 


35 


36 


38 


39 


40 


41 


42 


43 


44 


45 


46 


47 


49 


50 


51 


52 


53 


55 


56 


58 


59 


60 


61 


62 


63 


64 


Binomial  Distribution  of  MDC4T  ■  64  bit 


observed 


calculated 


13 


100 


231 


522 


1  154 


2238 


3999 


6986 


11  168 


16762 


23733 


31  182 


3891 


45219 


49325 


50897 


49248 


45079 


38478 


30622 


2319' 


16598 


11  148 


6889 


4039 


2238 


108 


541 


216 


25 


19 


0.004 


0.02 


0.09 


(0-c)'2/C 


0.004 


0.02 


0.36 


1.33 


4.43 


0.09 


0.36 


1.33 


13.56 


38.3 


99.97 


242 


544.56 


1140.98 


2230.09 


4072.34 


6956.91 


11131.05 


16696.58 


23498.89 


0.04173815 


0.02312684 


0.48276762 


9.0027E-06 


0.004 


0.024 


0.11. 


0.474 


1.804 


1.84573815 


1.86886499 


2.35163262 


0.5 


0.93461437 


0.14857438 


0.02805631 


1.32080219 


0.121638S 


0.12265712 


0.25632653 


31052.1  1 
38547.45 


44972.01 


49324.13 


50865.53 


49324.13 


44972.01 


38547.45 


31052.11 


23498.89 


16696.58 


11131.05 


6956.9- 


4072.34 


2230.09 


1140.98 


2.33234387 


0.54332579 


3.4855458 


1.35648952 


1.5345E-05 


0.01947018 


0.1  1750389 


2.35164162 


2.85164162 


3.78625599 


3.93483037 


3.96288668 


5.28368887 


5.40532737 


5.52798449 


5.78431102 


8.11665489 


8.65998068 


12.1455265 


13.502016 


13.5020313 


13.5215015 


0.25453299 


0.1251263 


5.95755368 


3.95584268 


0.58203635 


0.02581091 


0.66290467 


0.27295255 


0.02805631 


544.56 


242 


99.97 


38.3 


2.55380497 


0.0232731 


2.79338843 


0.35651595 


13.56 


4.43 


1.33 


0.36 


0.09 


0.02 


0.004 


4.61853786 


2.18241888 


0.46160271 


0.0818797 


13.6390054 


13.8935384 


14.0186648 


19.9762184 


23.932061  1 


24.5140975 


24.5399084 


25.202813 


25.4757656 


25.5038219 


28.0576269 


28.0809 


30.8742884 


31  .2308044 


35.8493422 


38.031761  1 


38.4933638 


0.36 


0.09 


0.02 


38.5752435 


38.9352435 


39.0252435 


0.004 


39.0452435 


39.0492435 


39.0492435 


39.0492435 


39.0492435 


39.0492435 


39.0492435 


39.0492435 


39.0492435 


39.0492435 


39.0492435 


39.0492435 
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Appendix  -  Program  Code 


P„l,,    ,,^^    .,     ^,^    pru-^ra;   5     r^n    .1 1    a    U,„e,    oji    structure 
-^  V  M  I  3  f  !  .?     for     u  I  I      f  D  L  ;  . 


t,  =  n  <=  r  ri  t  •-  s     on  &  c  H  s  L..  s     f   ;  I      :^  1  <l  » 0  J  j    M  I  o  0  r  d  i(i  s    3  c  c  0  r  J  i  n  q     to 
'if  :■  c  0  J  e  0     I  r;  i  o     1 1 1  -;     p  r  o  j  r  j  '■  . 


Notp:     if    r,      'iff=.r-if     n,   •cri'.n.r,    o:hcr     tn^in    mUu4     ,5    s 


f^e      is-,  iuni:;-L     lo     criKsu'n    ,nust    -je     ci'^njeo. 


i  e  I  e  c  t  c  I 


«  i  nc  !  ur-e  "r  n  ,  n" 
I  n  r  -5  ^T  3n  2  4  <•  (  )  ; 
/'i-'    v?ri?.  bl«r.    for     jata  ='••/ 

'°^^^  '"^     -  /-=     -twr.     rcr     j2     oil     r.njorr,    nu„:,cr     -/ 

''^"^  '^^    '■^^■'  /■•■    .t.r..     rcr     i :.    o.t    ...ocKi    ^:v 

'^^  ^^  ^f''^=  '-■■     ::^uro     ;o/      .     :.  1  .    .  ,  .c.'S     ./ 

Inno  ,nt     r -a  I   i  n ;- =  .;•  ;  /.;•     cn^racier     lo     eru;     pro^r..,  :?/ 

/*    y^rinM-s     an,!     initj.,     valuas     fur     r:,^c     equ.cons 
I  w  i  t  h    anri    w  i  t^out     L  s  s J  ^'^  / 

Icm     jnt     -■!-:  =  :  S  033.:;,;^  ,,  =  32707; 

lO-in       l-t      m    J=V20,,,2j  =  ;^^^.    .    ,^^.,^^-.^._,^^,    ,,^.  .         ^ 


''I  LP     =^*^-l;  .        /=;^    t  i  If    :^an^\^     f 


or     b  t  0  r :.  M  e     •.-  / 


onr,     ,nt    .,,i,,,p,  /.,    .^^     n.n^l,,,,    ,r,ocu,ar     .ritn^atic    '^  / 

I  r;  n  o       I  n  t      r.  .  r  I  J 
■f 

I  0  n  q      j  n  t      0  : 
r  =-n    7     n  ; 
i  f     (  D<^)     -C 

n  =  n  +  n  ;      > 
return    p-  : 
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Appendix  -  Program  Code  (cont) 

•r 

I   p  n  ^^        j   n  ^       '"f;  1  "  *"   »  ''^  Z  "l  S   • 

I  o  n  o     i  r,  t     m  1  ■■  s  t  . "  2 , 1  s  L  ; 


r>  2  a  s  ='^;  0  (1 1  (  ^  o  I-*  1  ( -n  2  a  s  t « r  1 .""  -  )  +  "  o  vJ  i  (  ;,';  Z  a  s  c  »  n  Z  a  )  »     1 1 Z  a  )  ; 

>; 

void  '^•dc?2_2=,:_tss()    /■■=  3d  oic  Z  equation  witn  tss 

{  a  -1  u  i  ^  i  0  n  a  I  r  e  ^^  J  j  -t  c  k  t,  c  r  in  )   v  / 

I  o  n  n     i  -1 1    '"  1  D  s  •  "1 2  b  V.  t 
lone     lot     ifi  1 0  ■;  t  1 .'!  2  0  s  L  ; 
I ong     t  ss  ; 

ts5  =  (  (bCOirOyf  '  r  f  GIjO)      |      I  o  L  1  J  ■'.  J  x  f  f  f  f  ;  )  ; 

ni>s  t  =  {;^lo  Ao  r  01— nZ--"^  L  IJ  )  ," 

m  ihs=rricil  ('-•0'-'  1(  !i  L  uS  I  ,  .1  '.^  )  ^';.o  :-  i  jiIl^  s  l  ,n  i  j  )  ,     nlu  ;  i 

fT'2bst  =  (".  :>t.  Af:'""  ^'^-'-i^JA^r  l  ]  j  ; 

rT,2bs=^"ll{ -or' 1  (.;'2;, -SI  .ri?!,  }  Ti,;,;  J  1  ;;;;2  oS  t,  ;i.'!.  J  ,    n2u)i 

mlb=mlhs; 
m2  b=  rn2hs  ! 


VOID    oetr^mf)  /=;;    fi.n.-*i"r-    ir,    ^ 

^  -.    ..     '  /.     ^uOLcion    iHcii    qeis     Lhe     rariao.r,    nu.-noer     v/ 

&rij=(a^>i*:)f:n^ffff; 


t  r !)]  =  3    c    'I x M  ; 

tf  2  1  =  (a^>l6  )     r     " ..  '  r 

t  [  3  1  =  (  d  >  >  7  '^  1     f.     "  <  f  f 

>; 


/*     Su  I  i  I     I  L     I  n  to    c    Die    criunr-. 


s 
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Appendix  -  Program  Code  (cent) 


.^  i  t,  *•!    r.  r.  f;     (  a  :  a  i  L  i  J  n  <i  I     f  e  -  o  i<,^  c k  )     ^^  / 

lona     int     tss; 

tss=(  (MoirpvfO'jO)    :    (  tL  1 J  £ux- jfo;;)    ; 

rr2Hc;t=(-n2':.t  r  ;i_,^3^,^,  -  j^^,^,j^.  r^  J-fni  j  s  ^  L  [ -i  ]-     tss)  ; 

r?Hs  =  r,n,n(-0   U(,:?,SL.      n^C)       ^      i,:  -J  1  I  r.  .  a  s  t  ,      n2u),  020; 

r"3^st^(n.^,-.trM.-^-M-:.U2jt:-  1  j  s  ^  t  [  .^  j -,r,  2  J  :;  .  L  [  -^  j  -     Lss;  : 

m^ds  =  -odl('no-Hl(,,4.sc,     n-O     +    ...no  i  .  n,.  ::  ^  t  ,     ,,hO,     r,',a); 
rr  3  H  =  rp  3  H  s  ; 

rr.  ^  rj  =  Ti  A  H  5  ; 

>; 

2  I  u  c  s    a  L    s  L  a  r  L    of    ri  ^j  •„    o  r  o  9  r  am    -f  / 


void     r  =>  i  n  i  t  (  )  /  ■;-     r  p  i  e  i 


f.  la  =  160'^?  :  •M2a=327G7; 
"iir-i^033;('i?j-L27:7; 

'Hi  c=2  29  ;  ni2  c  =1  1  3  ;,:i?c  =22  7  ;;i,4c  =12  7; 
tnlr^=2  29;r:7j=113;,i.3u=^27;m4u  =  i27  ; 

>5 
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Appendix  •  Program  Code  (cent) 


V  c  i  ri    f"  1^  c  "*  Z    ^  ^  ■  '  )  /  ■■•    i  Z     o  i  t      •     e  4  '^  u  t  i  o  n    ,ii  J  c       ''^/ 

"C 

lone      i  r,  t     ;i'  1  c  S  L  •      ^i  Z  C  5  L  > ''  3  C  i  t  5  i"  -t  C  S  I  » 
I  0  n  a     i  n  t    r^i  1  c  s  ^     : .  ?  c  r^  1  t  3  c  3  • ::. '-  c  5  ; 

rr.  ]  r  s  t  =  f  in  1  c  A  t  r  n  -  T>  2  c  A  t  [  .?  J  + .- ,  1  c  A  L  [  ^  ]  -r:;  -H  c  A  ::  L  '•  J  )  , 

m  1  c  s  =  rr  n  d  1  (  Ti  c  "  1  (  .::  1  c  ■;  c  ?     n  1  c  )     +    n.  o  J  1  i  Hi  i  c  :3  r.  .     (i  1  c  )  «    n  1  c  )  j 

m ;?  r  s  t  =  ( rn  ?  c  A  t  r  1  ]  - :;■  3  c  A  t  L  ?  J  +  I'i  4 c  A  t  L  3  ]  -rr,  i  c  s  A I  [  ^  J  )  ; 

f'i^r  s  =nca  1  (  mc  '^  1  f  ^2  c  s  I  »     ritlc  )      «■     uiOvj  1  i  ;;i,l  cs  t  >     n  Z  c  )  »        n^lc  i  j 

p3  i-s  t=  f  trBc  A  t  r  1  ]  -ir '•  ca  t  [  ^:  j  ♦lul  c  s  A  t  L  3  j-n,£  c  sa  l  i.  4  j  j  ; 

m  3  c  S  =  '-  no  1  (  "^  r;  ''  1  (  •;,  3  C  S  *   •      :■  3  C  )       *     ,1,  0  J  1  (  li  J  C  3  t  »      (1  3  C  '   j  ri  i  C  }  » 

pAcst  =  'rnAcAt[l/|-niiCSAt[i:']trq£CSA:[iJ-T.JC5ALillj; 

rT>H  r  s^l-- nj  1  (  ^0 -i  1  (  :-;A  c  S  t.  »      'I -t  C  i      t      ,'.'■;>  d  1  (  ri '-t  C  3  t  »      ri1C)»      ll-^C); 

rr,  2r  =  T?rs; 
rr.  3r='"3''sl 

> ;  -  -  ^  •    ••    . 
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5  f-*' 


Appendix  -  Program  Code  (cont) 


f  1  &  i  n  ( 


in*-     th  i  5r.n*;.=n  ,     nc  fi  t  =2  O'' i     c  n  t.  =  ^  ,  en  1 1  =u  ;  /^    c  oun  r  e  r  s     cc    Keep 

LracK    of     place     in    rrogram 

j  n  t     r  o  n  t  ; 

1 1  n  '^  i  o  n  f*  n  i  '^  t    i;  "^  3  '^  I  ''j  V  [  J.  j  J 

u  n  s  i  "^  n  T"  I  o  n  r;  c  n  !•<  s ' J  :■; ! 

f  n  T  =  f  0  p  "  n  (  "  / '  1  s  r  c  /  V  a  f  n  e  y  /  ";  J  2  .  't  .  t .  d  a  c  a  "  1  "  w  "  )  ; 

f  P  r     (  c n  t  =  0  :  c  n  t '' ■.'  ;  ':  n  t.  *  T  1     s  e e  J  1  '.,•  v  i.  o > 1 1  j  =  j  I  / '■•     initialize    r  ri    s  e t  d     ''I 

c  n  <■  =  0  : 

5  ^j  p  (]  A  p  (  s  r  e  '!  1  '^^  V  )  ;  /  *     i  f  i  i  I  i  a  i  I  7  e    r  >  i    a  u  :  i  c  r  a  t,  o  r    ■••=  / 

cont  =  TPMP  ; 
\r'  n  i  I  e     (cont;) 

■r 


getranon:  /-     je:     t.- <;     ran   -o:.!    prcbrd-'.       v/ 

c  n  1  +  +  ; 

if     f  r  [Ol  =  =  ^p.;;  I  I  ne     l!     tn  i  sc  r,  t +  + =  =  ncn  U     v    /  =•=    end    of     a    pr'joi    */ 
n  c  n  t  =  t  [  2  ]  ; 

t  h  i  :  c  -1 1  = !}  ; 

en  ♦:  1  -^  +  ■ 

chi' S'j.p- (  •::lr  <<<L-t  J  +  I    'iic  <s<  iu  ) -^  I ."' JC^-vo  )  f  lite  J     /  =■'    m  a  K  fc     cnksuii    */ 

n r  i  n  ♦•  f  ( f  T  i  .    "  /.  y  \  t.  -. '-  \  !■-  -.    \  ri  "  .  c n  ft  i  u !.■  »  c I';  t  i  » t-  P  i  )  ; 
if     (  ^ -■  t  1>-:1^')'jO  )     ccn  t-r^ALiZ  ;  /is     i '«,     ^rie     l-ast     prqm?     =:■  / 

i  f  ( ::  [  0  ]==cnn  I  I  ne  )     ■■' un  g  I  i  n  c  =  u  i  x  ]  i     r  c  i  n  1 1  (  )  i  c  n  t  =  0  ;     j\ 
} 

/=^fr>.-ic  32_2eq  (  )  ;  v/  /v  j2     ii  i  t     2     equation     niCC       =■' / 

/^friHri,  ?_2<:.c_css();-i'/  /•'■    J2     oit     2    equdtion    ..:rjc       */ 

mdc3  2_'t  e  T  (  )  j  /  •'•'  j2     oit    -^    e.iuation    iiiQc       ■■'/ 

/*    :ndc32_'^'?.;_lss(i;v/  /v     ^^     bit     't    et^uation    mcc       =^/ 

> 
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ABSTRACT 

This  thesis  deals  with  the  construction  and  testing  of  checksum  algorithms  for  computer 
virus  detection  on  small  computer  systems.  Checksum  algorithms  need  to  produce 
checksums  with  the  following  features:  even  mapping  over  the  range  of  possible  check- 
sums, permutation  dependency,  and  every  bit  of  the  checksum  is  an  overdetermined 
function  of  all  the  bits  of  the  set  of  data  being  checksummed.  Checksum  algorithms  to 
protect  against  viruses  also  need  to  be  noninvertable  and  produce  checksums  with  ade- 
quate lengdi  because  viruses  can  employ  either  a  brute  force  or  a  trap  door  attack  against 
the  checksum.  A  birthday  attack  was  shown  to  be  not  applicable  in  the  case  of  strong 
checksum  algorithms.  The  methods  to  construct  checksum  algorithms  with  these  proper- 
ties include  substitution,  transposition  and  feed  back.  Cryptographic  checksum  algo- 
rithms were  found  to  be  too  inefficient  for  small  computers  and  effort  was  concentrated 
on  noncryptographic  algorithms.  Several  noncryptographic  checksum  algorithms  were 
created  and  shown  to  have  the  necessary  features.  These  algorithms  were  also  tested  for 
efficiency  (speed  of  execution).  On  the  basis  of  the  strength  and  efficiency  of  the  check- 
sum algorithms  a  recommendation  of  checksum  algorithms  for  different  types  of  small 
computers  was  presented. 


